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1 Introduction 

This chapter offers a brief introduction to what is often called the convex-operational approach 
to the foundations of quantum mechanics, and reviews selected results, mostly by ourselves and 
collaborators, obtained using that approach. Broadly speaking, the goal of research in this vein is to 
locate quantum mechanics within a very much more general, but conceptually very straightforward, 
42 generalization of classical probability theory. The hope is that, by regarding QM from the outside, 

so to say, we shall be able to understand it more clearly. And, in fact, this proves to be the case. 

The phrase "convex-operational" deserves some comment. The approach discussed here is "con- 
vex" in that it takes the space of states of a physical system to be a convex set (to accommodate 
the formation of probabilistic mixtures), and draws conclusions from the geometry of this set. It 
is "operational" in its acceptance of measurements and their outcomes as part of its the primitive 
conceptual apparatus, and in its identification of states with probability weights on measurement 
outcomes. In this sense, it is conceptually very conservative, differing from classical probability only 
in that it is not assumed that all measurements can be made simultaneously. 

From this starting point, one is led very naturally to a mathematical framework for a post- 
classical probability theory, which, while varying idiomatically from author to author [27l [29l 
[32l l39l 140} l46l [49] . is more or less canonical. About the first third of what follows is devoted to a 
detailed discussion of the structure of individual probabilistic models in this framework. Here we 
exhibit a range of simple non-classical examples, many of them quite different from either classical 
or quantum probabilistic models. At the same time, we try to bring some order to this diversity, by 
showing that essentially any probabilistic model can be represented in a natural way in terms of an 
ordered real vector space and its dual, and that processes operating on and between models can be 
reresented by positive linear maps between these associated spaces. 

Starting in Section 3, we focus on composites of probabilistic models, subject to a natural non- 
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signaling constraint. As we shall see, the phenomenon of entanglement, often regarded as a hallmark 
of quantum mechanics, is actually a rather generic feature of non-signaling composites of non-classical 
state spaces, and thus, more a marker of non-classicality than of "quantumness" per se. Since 
quantum information theory treats entanglement as a resource, the question then arises of which 
quantum-information theoretic results can be made to work in a more general probabilistic setting. 
Section 4 reviews some work in this direction, particularly the generalization of the no-cloning and 
no-broadcasting theorems of [9| 110) , and the analysis of teleportation and entanglement-swapping 
protocols in terms of conditional states, following [IT] . 

If many non-classical features of QM are not so much quantum as generically non-classical, 
what does single out QM? The question of how to characterize QM in operational or probabilistic 
terms is a very old one. After many decades of hard- won partial results in this direction (e.g., 
[4JE1I22 , 37, 54, 63, 75]), the past decade has produced a slew of novel derivations of finite-dimensional 
QM from fairly simple, transparent and plausible, assumptions [23l [26j [39l US ES] (to cite just a 
few). In Section 5, we outline one of these, which recovers the Jordan structure of finite-dimensional 
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quantum theory from symmetry considerations; the specific C*-algebraic machinery of standard 
quantum mechanics is then singled out by considerations involving the formation of composite 
systems. The key tools here are a classical representation theorem for homogeneous, self-dual cones, 
due to M. Koecher and E. Vinberg [121165]. and a theorem about tensor products of Jordan algebras 
due to H. Hanche Olsen [35]. 

Since the aim of this paper is to provide a brief and accessible introduction to this material, 
we make some simplifying assumptions. The most important is that we focus entirely on finite- 
dimensional models, even though large parts of the apparatus developed here work perfectly well 
(and were first developed) in an infinite-dimensional setting. Further assumptions will be spelled 
out as we go. 

Notational conventions Real vector spaces are indicated generically by bold capitals E, F, etc. The 
space of linear mappings E — > F is denoted by L(E, F); E* denotes the dual space of E. If IK is a 
real or complex Hilbcrt space, stands for the space of bounded Hermitian operators on Jt. 

If A is a set, M. x denotes the vector space of all real- valued functions on A. 

2 Elementary probability theory, classical and otherwise 

If 3~C is a Hilbert space, representing a quantum-mechanical system, then each state of that system 
is represented by a density operator p. A possible measurement outcome is represented by an effect, 
i.e., a positive hermitian operator a with < a < 1; Tr(pa) gives the probability that a will occur 
(if measured) when the state p obtains. This probabilistic apparatus generalizes that of classical 
probability theory, in that if we fix an observable, that is, a set {ax, ...,a n } of effects summing to 1, 
we can understand this as a model of a single, discrete, classical statistical experiment, on which each 
state p defines a probability weight p{i) :~ Tr(pai). The novelty here is that, in general, a pair of 
observables {oi, a n } and {b\, ...&&} is not co-measurable. In classical probability theory, it is always 
assumed (if often tacitly) that any pair of outcome-sets E\ and E 2 admit a simultaneous refinement, 
that is, both can be represented as partitions or "coarse-grainings" of some third outcome-set F. In 
quantum-probability theory, this is not the case. Unless the operators at and bj all commute, there 
will be no third observable of which Ei are both coarse-grainings. 

So, quantum probability theory foregoes the assumption of co-measurability, which is a tenet of 
classical probability theory. And, indeed, in retrospect, the latter is surely a contingent matter, so it 
is not so very radical a step to renounce it. It is not so much the intuitive notion of probability that is 
post-classical, as the overall framework, which is in a precise sense a generalization of the framework 
of the classical mathematical theory of probability. On the other hand, quantum probability theory 
replaces the simple axiom of co-measurability with the elaborate apparatus of the Hilbert space 
and its associated space of Hermitian operators. As a framework for an autonomous probability 
calculus, this seems less than perfectly well motivated, and one can wonder whether, and why, 
it is necessary. A sensible way to approach this question is simply to drop the co-measurability 
assumption, without making any special assumptions to replace it. The resulting post- classical 
probability theory is a vast, poorly explored, and rather wild region, within which even quantum 
probability theory seems rather tame. 

2.1 Test spaces and probabilistic models 

There are many more or less equivalent, but stylistically diverse, ways of formulating a post-classical 
probability theory. The approach we take here (due originally to C. H. Randall and D. J. Foulis 
[31i 132] ) begins with a very minimum of raw material. 

Definition 1. A test space is a pair (A, 3Vt) where X is a set of outcomes and 3Vt is a covering of 
X by non-empty sets called tests. A probablity weight on (A, 3Vt) is a function a : X — > [0, 1] with 

J2xEE a ( X ) ~ 1 f or ever V E £ 

The indended interpretation is that each E € 3Vt is the set of mutually exclusive outcomes 
associated with some probabilistic experiment — anything from rolling a die to asking a question to 
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making a measurement (via some well-defined procedure) of some physical quantity. It is permitted 
that distinct tests may overlap, that is, that distinct experiments may share some outcomes. The 
definition of a probability weight requires that, when this is the case, the probability of a given 
outcome be independent of the measurement used to secure it. In other words, probability weights 
are non- contextual^ 

It will be convenient to use the same letter, X, to denote the entire test space (X, M), as well as 
its outcome-set, leaving the set of tests tacit. When necessary, we'll write M(X) for the latter. We 
also write Sl(X) for the set of all probability weights on X. This is a convex subset of [0, 1} X CI X , 
i.e., 

a,/3en(X) => ta + (1 - t)fi € O(X) 

for all < t < 1. Where X is locally finite, meaning that every test E £ 3Vt(X) is a finite set, it is 
not hard to see that Q(X) is closed, and hence compact, with respect to the product topology on 
[0, l] x . It follows that Cl(X) is the closed convex hull of its extreme points. 

Models In constructing a model for a probabilistic system, we may wish to single out certain 
probability weights as corresponding to possible states of the system. It is reasonable to form 
probability-weighted averages of such states, in order to represent ensembles of systems in different 
states. It is also reasonable to idealize the situation slightly by assuming that the limit of a sequence 
of possible states should again count as a possible state. In the same spirit, we shall assume in what 
follows that X carries a Hausdorff topology, with respect to which states are continuous. This is 
harmless, since we can always use the discrete topology as a default! Indeed, given that our focus 
here is exclusively on finite-dimensional models, it is not unreasonable to assume that X is even 
compact. 

To make all of this official: 

Definition 2. A probabilistic model — or, for purposes of this paper, just a model — is a structure 
(X, il), where X is a Hausdorff test space and f2 is a pointwise-closed (hence, compact), convex 
set of continuous probability weights on Q(X). The extreme points of fl are the pure states of the 
model. 

Notation: We henceforth use capital letters A, B, etc. to denote models, writing, e.g., (X(A),JVt(A)) 
for the test space belonging to model A, and £l(A) for A's state space. (So technically, A = 
((X(A),M(A)),Q(A)).) 

Example 1 (Classical Models), (a) The simplest classical models have the structure (E, A(E)), 
where E is a single test (so that JA(E) = {E}), and where and A(E) is the simplex of all probabil- 
ity weights thereon. We might also deem "classical" a broader set of models: those of the form (E, fi) 
where C A(E) is any closed, convex set of probability weights sufficiently large to statistically 
separate different outcome^ of the single test E. 

(b) A more sophisticated classical model begins with a measurable space S, and identifies statistical 
experiments with finite or countably infinite partitions of S by measurable subsets. The collection of 
all such experiments is a test space: let X(S) be the set of non-empty measurable subsets of S (say, 
with the discrete topology), and let 1>(S) be the set of countable partitions of S into measurable 
subsets. We call (X(S),D(S)) the Kolmogorovian test space associated with S. Probability weight 
on (X(S),D(S)) correspond exactly to countably-additive probability measures on 

1 The formalism easily accommodates contextual probability assignments, however: simply define X to be the 
disjoint union of the test in 3Vt — say, to be concrete, X = {(x, E)\x € E 6 3VC}. In effect, each outcome of X consists 
of an outcome of X, plus a record of which test was used to secure it. For each test E £ M, let E = {(x, E)\x £ E}, 
and let 3Vt = {E\E £ 3Vt}. Probability weights on (X,3Vt) are exactly what one means by contextual probability 
weights on (X, 3Vt). There is a natural surjection X — > X that simply forgets these records; probability weights on 
(X, M) pull back along this surjection to give us weights on (X, 3VC). 

2 A more detailed discussion of test spaces with topological structure can be found in 1681 

3 That is, given any pair of distinct outcomes, there exists a state assigning them different probabilities. 

4 By varying D(S), we can change the character of the probability weights that are allowed. For example, if we 
let D(S) include just the finite measurable partitions of S, then probability weights on D(S) correspond to finitely 
additive measures on S. 
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Example 2 (Quantum Models), (a) The most basic quantum- mechanical model begins with 
a complex Hilbert space !K. The quantum test space is (X (tC), M(IK)) where the outcome space 
is the unit sphere of !K (with its usual topology) and where the space M(!K) of tests is the set 
of unordered orthonormal bases of frames of 3"C. Every unit vector v G 3~C determines a probability 
weight a„ on M(JC), defined for all x € A(JQ by 

a„(a:) = \(v : x)\ 2 = Tr(P„P x ), 

where P„ and P x are the rank-one projection operators corresponding to v and x. Accordingly, if W 
is a density operator on JC — a positive hermitian operator of trace one, or, equivalently, a convex 
combination of rank-one projections — then otw{%) '■— (War, x) — Tr(WP x ) defines a probability 
weight on A(!K). If dim(!K) > 3, then Gleason's theorem tells us that every probability weight on 
X(JC) is of this form, but for dim(!H) = 2, there are many others, which one regards as non-physical. 
In either case, letting fi(IK) denote the convex set of density operators on Jf, we obtain the quantum 
model A{K) = (X(JC), fi(JC)). 

A slightly different model, which we'll call the projective quantum model, and which we denote 
by A(¥9i), replaces each outcome x G A(PJC) by the corresponding rank-one projection operator 
P x ; tests in M(PJC) are maximal pairwise orthogonal families of such projections. Again, states 
correspond to density operators via the recipe aw(Px) — Tr(WP x ) where P x G X(PJ£). For many 
purposes, the choice between A(fK) and A(P!K) is one of convenience. However, notice that in 
passing from A(!K) to A(PJ€) we lose information about phase relations between the unit vectors 
representing outcomes of A(!H), which are important in describing sequential experiments. We 
won't pursue this here. The paper |74] contains some relevant discussion. 

(b) A more sophisticated quantum model might begin with a VF*-algebra A, and take for M, the 
collection of all (say, finite) sets of projections summing to the identity in A. If M has no I2 
summand, the Christensen-Yeadon extension of Gleason's theorem [28] identifies the probability 
weights on M with states on A. Again, if there are I2 factors (copies of M2(C)), then one must 
explicitly limit the states to the quantum-mechanical ones. 

By the dimension of a model A, we mean the dimension of the span of Cl{A) mR x( - A K Of course, 
this will generally be infinite. However, as mentioned in the introduction, our focus in this paper is 
on finite-dimensional models. Indeed, making this official, we assume from this point forward that 
all models are finite- dimensional. In particular, all quantum models A(JC) and A(P5C) involve 
only finite-dimensional Hilbert spaces !H. 

If we let V(f2) denote the span of fl in WL X W, we can map X{A) into V(A)* by evaluation. That 
is, for each outcome x G X(A), there is a canonical evaluation functional x : ~V(A) —> R given by 
x(a) = a(x). It may happen that, for some sequence x, of outcomes, a{xi) — > a G ~V(A)*. Let us 
say that A is outcome- closed iff every such limit again corresponds to an outcome in X(A), i.e, that 
there exists some x G X (A) with a = x. Where X (A) is compact in its native topology — which, in 
finite dimensional examples, it very often is — this condition is automatically satisfied. We make it 
another standing assumption that all models are outcome-closed. 

Dispersion-Free States and Distinguishability One very striking difference between classical 
and quantum models has to do with the existence of (globally) dispersion-free, that is, zero-or-one 
valued, states. In both of the classical models considered above, all pure states are dispersion-free. 
Quantum models, in contrast, have no dispersion-free state: a pure quantum state sill makes only 
uncertain predictions about the results of most measurements. 

Definition 3. A set f2 of probability weights on a test space X is unital iff, for every x G X, there 
exists at least one a G £1 with a{x) = 1. If there is a unique such state, we say that f2 is sharp. We 
say that a model A is unital or sharp if its state space tt(A) is a unital, respectively sharp, set of 
probability weights on the test space X(A). 

Like the classical examples, the quantum quantum models A(9t) and A(P5Q are sharp; indeed, 
the unique state a assigning probability one to a given outcome x G A(!H), or to the corresponding 
outcome P x G A(P!H), is is the one corresponding to the density operator P x . 
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Definition 4. A set Q of probability weights on a test space X separates outcomes, or is separating, 
iff, for all outcomes x,y G X, a(x) — a(y) for all a G ft implies x = y. A model A is separated iff 
separates outcomes of X(A). 

The state space of a standard quantum model A{"K) is not separating; that of the corresponding 
projective quantum model A(PJ€) is separating. As this example illustrates, given a non-separated 
model A, one can always replace X(A) by an obvious quotient test space, in which probabilistically 
indistinguishable outcomes are identified, to obtain a separated model having the same sates. One 
may or may not wish to do so. 

A partition space is a test space that is isomorphic^ to a sub-test space of D(S) for some set S. 
Any such space supports a state-separating set of dispersion-free probability weights, namely, the 
point-masses associated with the points of S. The following is straightforward: 

Lemma 1. If test space has a unital, separating set of dispersion- free states, then it is a partition 
test space. If it has a sharp set of unital, DF states, then it is classical. 

In anticipation of later results, we'll write x _L y to mean that outcomes x,y G X(A) are 
distinguishable by means of some test E G X(]Vl) — that is, that x, y € E and x ^ y. At present, 
there is no linear structure in view, let alone an inner product, so the notation is only suggestive. 
Later, we'll see that one can often embed X in an inner product space in such a way that the 
notation can be taken literally. 

It will also be useful to introduce the following notion of distinguishability for states. 

Definition 5. Two states, a, f3 G Q(A) are sharply distinguishable iff there exist outcomes x,y G 
X{A) with x _L y such that a(x) = f3{y) = 1. More generally, states a±, ...,a n are jointly sharply 
distinguishable iff there exists a test E G 3vl(A) and outcomes Xi, ...,x n G E with ai(xj) = <5.;.j. 

The idea is that, if the system is known to be in one of the states a%, a n , then by performing 
the measurement E we will learn — with probability one - which of these states was the actual 
onell 

2.2 Further Examples 

Classical and quantum examples hardly exhaust the possibilities, of course: the whole point of the 
present framework is to provide us with a maximum of flexibility in constructing ac hoc models. 

Example 3 (The Square Bit). The very simplest non-classical model starts with a test space X 
be a test space containing just two tests E — {x,x'} and F — {y,y'}, each having two outcomes 
- as, say, two coins, or a stern-Gerlach apparatus with two angular settings. The convex set 
Q(X) of all probability weights on X is affinely isomorphic to the unit square, under the mapping 
a M- (a(x), a(y)). The model (X, fl) has, accordingly, been called the square bit [12]. As fl(X) 
is not a simplex, this model is not entirely classical. On the other hand, as its pure states are all 
dispersion-free, it is very far from being "quantum" . 

Greechie Diagrams A useful graphical device for representing small test spaces (those involving 
only a few outcomes) is to represent each outcome as a dot, and to join outcomes belonging to a 
test by a straight line or other smooth arc, with arcs corresponding to distinct tetst intersecting, 
if at all, at a sharp angle, so as to be easily distinguished. Such a representation (first used in the 
quantum- logical literature) is called a Greechie diagram [36j . For example, we might represent a 
three-outcome classical test by the diagram in Figure 2 (a), and the square-bit test space by that 
in Figure 2 (b). The test space pictured in (c), with two three-outcome tests (the top and bottom 

5 An isomorphism of test spaces is a bijection from outcomes to outcomes, preserving tests in both directions. 

6 A weaker notion would require only that cti(xi) > = cti(xj) for each so that with some non-zero probability 
we obtain either x or y, and thus learn which state was actual. Notice, too, that the condition of joint sharp 
distinguishability is a priori much stronger than pairwisc sharp distinguishability. 
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rows) and three two-outcome tests (the vertical lines), makes the point that a test space need not 
have any states at all. 



• • • • a • 4 • 

(a) (b) (c) 

Figure 1: Various Greechie diagrams 

The following whimsical example (due to D. J. Foulis) is useful as an antidote to several too- 
comfortable intuitions. 

Example 4 (The Firefly Box) . Suppose a sealed triangular box is divided into three interior cham- 
bers, as in the top-down view in Figure 2(a), below. The walls of the box are translucent, while 
the top, the bottom, and the interior partitions are opaque. In the box is a firefly, free to move 
about between the chambers (for which purpose, the interior partitions contain small tunnels). 
Viewed from one side, we might see the firefly flashing in chamber a or chamber b, or we might 
see nothing - the firefly might not be flashing, or might be in chamber c. Thus, we have three 
experiments, corresponding to the three walls of the box: {a, x, &}, {&, y, c} and {c, z, a}, where x, y 
and z are the (distinct) "no-light" outcomes associated with each experiment. The resulting test 
space 21 = {{a, x, b}, {b, y, c}, {c, z, a}} has the Greechie diagram pictured in Figure 2(b) below. 
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(a) (b) (c) 

Figure 2: The Firefly Box 

We can identify several pure states on this test space with concete situations involving the 
location, and the internal state (lit or unlit) of the firefly. For example, 

a{a) = a(z) = 1; a(b) = a(c) = a(x) = a(y) = 

corresponds to the firefly's flashing in chamber a. We can define similar states [3 and 7 corresponding 
to chambers b and c. All of these states are dispersion-free. A fourth dispersion-free pure state, 
S, assigns probability 1 to the outcomes x, y and z. This corresponds to the firefly not flashing. 
These four dispersion-free states separate outcomes separate the six outcomes, and thus allow us, by 
Lemma 1, to represent the firefly box as a partition test space over a classical state space. However, 
there is also a fifth, non,-dispersion free pure state, e, given by 

e(a) = e(b) = e(c) = 1/2; e(x) - e(y) = e(z) = 0. 

This last state is difficult to interpret in any way but to imagine that the firefly responds to being 
observed through a given window by entering (with equal probability) one of the two corresponding 
chambers. Since any state on this test space is determined by its values at the outcomes x, y and 
z, the convex set of all probability weights for the firefly box is a non-simplicial set in M 3 : the pure 
states a, (3 and 7 correspond to the standard basis vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1), 5 corresponds 
to the origin, and e, to the vector 1/2(1, 1, 1). Thus, Q is affinely isomorpic to a triangular diprism, 
as pictured in Figure 2 (c). 
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Example 5 (Grids and Graphs). Let E be a finite set — for dcfinitcness, say {0, 1, n — 1}, with 
n > 2. We define two test spaces associated with E: 

(a) The grid test space, Sri(E), consists of all rows and columns of the n x n array E x E, that 
is, all sets of the form {x} x E or E x {y}. 

(b) The graph test space, $ra(E) consists of the graphs of permutations / : E — > E, that is, 
subsets of E x E of the form {(i, f(i))\i € E}. 

Both of these test spaces have outcome-set A = E x E, so a state on either test space can be 
regarded as an n x n real matrix with non- negative entries. In the case of Sri(E), these entries 
must sum to unity along each row and column; that is, the states on ^ri{E) are exactly the doubly 
stochastic matrices. By the Birkhoff-von Neumann theorem, these all arise as convex combinations of 
permutation matrices — that is, of the dispersion- free states corresponding to elements of Sra(E). 
Similarly, one can show that, for n > 3, every state of $ra(E) is an average of row states, or, given 
by a k (i,j) = Si.k and column states ak, given by ctk(i,j) — Sk.j- 

Every pair of pure states on either Sri(E) or Qr a(E) is distinguishable by a test in that space. 
Nevertheless, neither state space is a simplex for n > 3. The space of doubly-stochastic matrices 
has n! pure states, which, for n > 4, exceeds the n 2 + 1 states permissible for a simplex in K™ . For 
n > 3, 3(E) has only 2n pure states; however, the maximally mixed state a(i,j) = 1/n, can be 
represented as a uniform average over just the row states, or over just the column states; similarly, 
on $ri(E), it can be represented as a uniform average over any set of permutations the graphs of 
which partition E x E. By a curious coincidence, the test spaces Sri(3) and a(3) are isomorphic, 
so the state space of Sri(3) is isomorphic to that of Sra(3), and again, not a simplex. 

Remark: We've seen that a variety of convex geometries can arise more or less naturally as the (full) 
state spaces of test spaces. A natural question is whether every possible convex geometry arises in 
this way. A theorem of F. Shultz 57 shows that in fact, every compact convex set can be represented 
as the space of probability measures on an orthomodular lattice. The set of decompositions of the 
unit element in such a lattice is a test space, the probability weights on which correspond precisely 
to the probability measures on the lattice. Thus, Shultz' theorem implies that every compact convex 
set can be realized as the full state space of a test space. 

Models from Symmetry A symmetry of a test space A is a bijection g : A — > X such that 
both g and g^ 1 preserve tests — in other words, such that for all E C X, we have gE G M(A) 
iff E £ M(A). (In other words, it is an isomorphism from the test space X to itself.) The set 
of all symmetries of X is evidently a group, which we'll denote by G(X). There is a natural dual 
action of G(X) on probability weights on X, given by ga :— a o g" 1 ; a symmetry of a model 
A = (A, fi) is a symmetry of A that also preserves VL. Again, the symmetries of a model form a 
group, G(A) < G(X(A)). 

Both classical and quantum test spaces are marked by very strong symmetry properties. In 
particular, the symmetry group of either kind of system acts transitively on pure states, and also on 
the set of tests; moreover, any permutation of the outcomes of any given test can be implemented 
by a symmetry of the entire system. (This is more or less trivial in the case of a classical system; 
for a quantum system, it amounts to the observation that any permutation of an orthonormal basis 
for a Hilbert space 3"C extends to a unitary operator on 3-C.) In contrast, no symmetry of the "firefly 
box" test space of Example 4 will exchange one of the outcomes a, b, c with one of x, y, z, since each 
of the former belongs to two tests, while each of the latter belongs only to one. 

Definition 6. Let G be a group acting by symmetries on a test space A. We say A is symmetric 
under G, or G-symmetric, iff G acts transitively on M(A), and the stabilizer Ge of a test E € M(j4) 
acts transitively on E. If A is G-symmetric and Ge acts doubly transitively on E, then A is 
2-symmetric under G. If Ge acts as the full permutation group of E, we say that A is fully G- 
symmetric. 

In fact, test spaces with these symmetry properties can be constructed very naturally |72) . 
Suppose one has a simple measuring device, which can be applied to a system of some sort to 
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produce outcomes in a set E. One might be able to apply this device in different ways — for 
example, by changing the orientation of the apparatus with respect to the system, or by adjusting 
some controllable physical parameters associated with the system. This suggests that we might 
be able to build a larger family of experiments — a test space, in other words — starting with 
the basic measurement E, and adding parameters that keep track of the various ways in which we 
might deploy it. In many cases, there will be a group G of "physical symmetries" acting on these 
parameters, and we can often reconstruct the desired test space simply from a knowledge of this 
group and its relationship to the test E. Specifically, there will be some subgroup H of G that 
acts to permute the outcomes of E. Let us suppose that H acts transitively on E, so that, for any 
reference outcome x D € E, every other outcome x £ E has the form hx for some h 6 H. If we 
let K be any subgroup of G such that K n H = H Xo , where H Xo is the stabilizer in H of a chosen 
reference outcome x G E, and set X = G/K. Then there is a well-defined canonical 77-equivariant 
injection j : E — > X given by j(x) = hK where x — hx . Let us identify E with its image under j, 
so that E C X, let S be the orbit of E under G, i.e., 

S := {gE\g e G}. 

The test space (X, S) will automatically be symmetric, and will be 2-symmetric or fully symmetric 
under G as H acts doubly or fully transitively on E. We obtain a G-symmetric model by choosing 
any G- invariant, closed, convex set of probability weights on X. 

The choice of the group K extending the stabilizer H Q has a large effect on the combinatorial 
structure of (X,9). For example, if K = H a , then M is a semi-classical test space consisting of 
disjoint copies of E; in general, a larger choice of K will enforce non-trivial intersections among the 
tests gE with jeG. 

Example 6. As an illustration of this construction, let E = {0,1,..., n — 1}, and let U be the 
group of all unitary n x n matrices, acting in the usual way on !H = C E . Let H < U be the 
subgroup consisting of permutation matrices, and K, the group of unitaries fixing en, the column 
vector corresponding to <E E. Then K(~\H is exactly the set of permutation matrices corresponding 
to permutations fixing 0, i.e., K n H — Hq. Now X = G/K is the (projective) unit sphere of 3-C, 
and M is the set of (projective) frames of !H. For another example, let H be the full permutation 
group S(E) of E and set G — S(E) x S(E). Embedding H in G by h H> (h,e), the construction 
above produces the "grid" test space $ri{E) of Example 6. Using instead the diagonal embedding 
h H> (h,h) yields the "graph" test space 3ra(E). 

2.3 Models Linearized 

In many situations, the outcomes of a test space are naturally represented as elements of a vector 
space. This is obviously the case for the quantum-mechanical examples discussed above, where 
outcomes are directly identified with unit vectors in JC or with rank-one projections in L(5C). One 
can also formulate classical probability theory in this way, by considering the space of random 
variables associated with a given measurable space, and identifying measurement outcomes (that is, 
measurable sets) with the corresponding indicator random variables. 

In fact, subject to some fairly mild restrictions, such a representation is always available. The 
idea will be to construct, for each such a model A — (X, f2), a real vector space E(A), and an 
embedding of X — > E(A), in such a way that states in fi extend uniquely to linear functionals on 
E(A). In fact, E(A) will be an ordered real vector space, so we pause briefly to review this notion 
(for further details, see [2]). 

Ordered Linear Spaces By a cone in a real vector space E, we mean a convex subset closed under 
multiplication by non-negative scalars, and satisfying K D —K = {0}. K is generating iff it spans E. 
it spans E. An ordered linear space is a real vector space E, equipped with a closed, generating cone 
E+. Such a cone determines a (partial) ordering, invariant under translation and under positive 
scalar multiplication, on E, namely a < b iff b-ae E + E Noticing that a > iff a e E + , we refer 

7 Some authors define ordered linear spaces without requiring that the positive cone be generating. For our purposes, 
the present definition is more useful. 
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to E + as the positive cone of E. 

The basic example is the space R x of all real- valued functions on a set X, ordered pointwise. 
Thus, 

(R x )+ = {/ £ R x I f(x) > e X}. 

Another example, central to our concerns here, is the space £/j(9f ) of bounded hermitian operators 
on a Hilbert space "K (over either R or C) . This space has a standard ordering, induced by the cone 
L + (!H) of positive semi-definite operators — that is, a £ L + (JC) iff (ax, x) > for all vectors a; e 3-C. 
More generally, the real vector space of self-adjoint elements of a C*-algebra A is ordered by the 
cone of elements of the form aa* , a £ A. 

If E and F are ordered linear spaces, a linear mapping / : E — > 2* 1 is positive iff f(E + ) C _F + , 
i.e, /(a) > whenever a > 0. An order-isomorphism between 22 and 2* 1 is a positive, invertible 
linear mapping having a positive inverse. We'll denote the set of postive linear mappings E — > F 
by L(E,F). This is a cone in the space L(E,F). As a special case, the dual space of an ordered 
vector space E has a natural dwaZ cone, E* + = L + (2?, R). In our present finite-dimensional setting, 
this is generating, so E* becomes an ordered vector space in a natural way. 

Order-unit spaces An order unit in an ordered linear space E is an element u £ E + such that, for 
every a £ E, there exists some n £ N with a < nu. When E is finite-dimensional, this is equivalent 
to asking that a(u) > for every non-zeroa £ E* + , which can always be arranged. (In particular, a 
finite-dimensional ordered linear space always has an order-unit.) An order-unit space is an ordered 
linear space equipped with a distinguished order- unit. The key example to bear in mind is the space 
£/j(!K), ordered as described above, and with the identity operator as order-unit. 

An order unit space already provides enough structure to support probabilistic ideas. A state 
on an order-unit space E is a linear functional a £ E* with a(u) = 1. An effect in E is a positive 
element a with a < u, so that < a(a) < 1 for every state a. A discrete observable on E is a finite 
set E = {ai, ak} of non-zero effects with a\ + ■ ■ ■ + = u; evidently, any state on E restricts to a 
probability weight on every observable on E. Thus, the observables form a test space, the outcomes 
of which are just the non-zero effects in E + . In the special case where E = L^(5C), the space of 
Hermitian operators on a Hilbert space 0~C, an effect is a positive opertor a with < a < 1; all states 
have the form a(a) = Tr(Wa) where W is a density operator on J£, and an observable is essentially 
a (discrete) positive-operator valued measure. 

The set of all (normalized) states on an order-unit space E is the latter's state space. This is 
always a compact convex set. Conversely, if fi is any compact convex subset of any finite-dimensional 
real vector space, let Aff(f2) denote the space of bounded affinc (that is, convex-combination preserv- 
ing) real-valued functionals / : 51 — > M, ordered pointwise. The constant functional u(a) = 1 serves 
as an order unit. One can show that 51 (embedded in Aff(O)* by evaluation) is exactly Aff(57)'s 
state space. Moreover, if T : VL — > W + is any affine mapping of 17 into the positive cone of a 
(finite-dimensional) ordered linear space W, then T extends uniquely to a positive linear mapping 
T:2J(0)*^W. 

The linear hull of a model Any probabilistic model can be interpreted, in a canonical way, 
in terms of an order-unit space with a distinguished family of observables. Let A = (X, 17) be a 
probilistic model. Every outcome x £ X(A) determines an affine functional x : 17 — > R by evaluation: 
x(a) = a(x) for all a £ 17. 

Definition 7. If A = (X, 17) is a model, write E(A) for the span of X in R , ordered by the closure 
of the cone consisting of linear combinations with non-negative coefficients of evaluation functionals 
x, x £ X(A): 

E{A)+ = cl ^| \ x i e X > 

Letting u £ MP denote the constant function u(a) = 1, we see that ^2 xeE x, where E is any 
test in lVl(A). Hence, u belongs to E + , where it functions as an order-unit. The order-unit space 
(E(A), u), togther with the embedding X(A) — > E(A), is called the linear hull of the model A. 
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Every test E £ JA{A) can now be regarded as a discrete observable on E(A). Notice that the cone 
E{A)+ may well be smaller than the cone { a £ E \ a(a) > Va € 57 } inherited from AflF(f2(A))+, 
and that, unlike the latter, it depends on the choice of X(A). 

Example 7. In the case of a quantum model A — (X (5C), DVC(JC)) of Example 2, the space E(A) — 
or, as we'll denote it below, E((K) — can be identified with the order-unit space of Hcrmitian 

operators on JC, ordered by the usual cone, with u the identity operator. 

There is a canonical embedding of 57(A) in Aff(57)*, taking each state a £ 57(A) with the corre- 
sponding evaluation functional / n- f(a), f £ AfF(O). Let V(A) denote the span of 57(A) in Aff(fi)*, 
ordered by the cone V+(A) generated by 57(A). Since E(A) < Aff(57), we have a natural duality be- 
tween V(A) and E(A), or, to put it another way, there is a natural linear mapping V(A) — > E(A)*, 
taking each a £ 57 to the corresponding evaluation functional in E(A)* . Since states are, for us, 
probability weights on X(A), this mapping is injective. 

State- Completeness If A = (X, 57) is a model, with linear hull E(A), then any positive linear 
functional a £ E(A)* with a(u) = 1 (that is, any state on E) defines a probability weight on X(A) 
by restriction. Let 57 denote the set of such states. Obviously, 57 C 57. We may regard 57 as the set 
of probability weights that are consistent with all of the linear relations among outcomes that are 
satisfied by the given state space 57. Evidently, the assignment 57 H> 57 is a closure on the poset of 
closed convex subsets of fl(X). Call a model state- complete iff 57 = 57. 

Lemma 2. Let A — (X, 57) be a finite- dimensional probabilistic model. Then the following are 
equivalent: 

(a) A is state- complete 

(b) E(A)+ = E(A) n Aff + (n) = E(A) n V(A)*; 

(c) The canonical mapping V(A) — > E{A)* is surjective, hence, an order-isomorphism. 

Proof: To see that (a) implies (b), suppose / £ Aff + (57) \ E(A) + . Then (by the finite-dimensional 
version of the Hahn-Banach separation theorem) there exists some a £ E(A)* with a(a) > for all 
a £ E(A) + but a(f) < 0. We can normalize a so that a(u) = 1, in wich case a £ 57. Since / is 
non-negative on 57, it follows that a ^ 57, whence, 57 7^ 57, and A is not state-complete. Conversely, 
if a £ 57 \ 57, then we can find some / £ E(A)** = E(A) with f(a) < but f(fi) > for all 
P £ 57. But now f £ EC] Aff+(57), and yet — as a{a) > for all a £ E(A)+ — we have / ^ E(A) + . 
Thus, (b) implies (c). As all systems here are finite-dimensional, (b) and (c) are clearly equivalent. □ 

Standing Assumption: Henceforth, all models are state- complete. 

One might almost, at this point, regard the test space X(A) as merely a sort of builder's scaffold- 
ing, to be discarded once the space E(A) has been constructed. For many applications, this works 
perfectly well. However, the additional structure represented by X turns out to be useful in many 
ways, so we prefer to retain it for present purposes Doing so imposes no additional restrictions on 
the structure of E(A) because, given an order-unit space E, we can always take X to consist of all 
observables on£, as discussed above. 

Direct Sums of Models A face of a convex set K is a convex subset J C K such that, for all 
a,b£ K and all < t < 1, 

ta + (1 - t)b £ J => a £ K and b £ K. 

8 One of many uses for the test space structure is to privilege certain classes of observables on an order- unit space 
having special order-theoretic properties — for example, the set of observables the outcomes of which lie on extremal 
rays of E+ forms a test space, or those whose outcomes are atomic effects, i.e., those that lie on extremal rays of E+ 
and are extreme points of [0, u] . 
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If J and K are cones, then this is equivalent to the condition that a + b E J => a E J and b E J. A 
minimal face of a cone is in fact a ray; we more usually speak of an extremal ray. An element of a 
cone is ray- extremal, or simply extremal, iff it generates an extremal ray. In finite dimensions, every 
(closed) cone is the convex hull of its extremal elements. 

The direct sum of two ordered vector spaces E and E is their vector-space direct sum, E © F, 
equipped with the cone E + © F+ consisting of all sums of positive elements from each. This is the 
smallest cone in E © F making the standard embeddings E, F — > E © F given by a M> (a, 0) and 
b i ^ (0, b) (for a E E and b E F) positive. In this case, E + and F + are both faces of E + © F + . .. 
E is irreducible iff not a direct sum. 

If X and Y are sets, we write X © Y for their coproduct (or disjointified union), 

X®Y = {1} x XU {2} x Y. 

If X and Y are test spaces, we make X © Y into a test space by letting M(X © Y") equal the set 
© F|S e M(X), F e M(Y~)}. We can understand a test of the form E © F as a two-stage test: 
first, perform the classical two-outcome test {1, 2} (by flipping a coin, say); if the result is 1, measure 
E, if the result is 2, measure F. A probability weight wonlxf corresponds to an arbitrary choice 
of a probability weight p on {1,2} and probability weights a E Q(X) and ft E O(Y), by 

oj(1,x) =p(l)a(x) and w(2, y) = p(y)/3(y). 

The weights p, a and (3 are uniquely determined by u>, so we can unambiguously write 

w = id; + (1 - i)/3 

In other words, Q(X ®Y) = Q{X) © Q(Y), whence, £(X © Y) = E(X) © £(Y). 

Every discrete classical probablistic model (E, A(E j) is a direct convex sum of trivial models 
({x},S x ) where x EE and S x (x) = 1. In contrast, the basic quantum model (X(?C), f2(!K)) is irre- 
ducible. The more general models associated with matrix algebras arise as direct sums of irreducible 
quantum models. 



2.4 Processes and Categories 

In very broad terms, a probabilistic theory might be nothing more than a class of probabilistic models. 
But this usage is really much too broad. Part of the job of a theory is to tell us, not only which 
models represent "actual" systems, but also something about how such systems can change. In order 
to speak about systems changing, we need to introduce into the preceding formalism a notion of 
process. A natural place to start is with the idea of a mapping <f> : a i— > 4>(a) taking states a of an 
initial (or input) system A to states of a final (output) system B. To allow for "lossy" processes 
or conditioning, we should permit 4>(a) be be a sub-normalized state of B when a is a normalized 
state of A. Finally, since randomizing the input state should randomize the output state in the 
same way, we should expect this <f) be an affine mapping. Thus, we model a process from A to B by 
an affine mapping <j> : Q(A) — > E(B) with us(^(a)) < 1; or, what is the same thing, by a positive 
linear mapping <j> : i£(y4)* — > E(B)* with ub °<fi < ua- We can interpret tigf^a)) as the probability 
that <j> occurs when the initial state is a — or, perhaps more accurately, as the probability that the 
process occurs, if initiated. 

To every process <f> : V(£?) —> V(A), there corresponds a dual process t = <f>* : E(A) — > E(B), 
given by <j>*{a) = a o <f) for any a E E(A). Operationally, to measure <f>*{a) on a state a, one first 
subjects the state a to the process <fi, and then makes a measurement of the effect a. Note that 
r(u)(a) = u(r*(a)) is the probability that the process r* = </> occurs if the initial state is a. In what 
follows, it will often be more convenient mathematically to deal with these dual processes. In other 
words, to use physicists' lingo, we'll often work with the "Heisenberg" rather than the "Schrodinger" 
picture of processes. 

Not every positive linear mapping V(A) — > V(_B) will generally count as a process. As remarked 
above, it is part of the job of a probabilistic theory to specify those that do. However, it seems rea- 
sonable to require that convex combinations of processes and composites of (composable) processes 
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also count as processes. It will also be convenient to assume that, for every pair of systems A and 
B, there is a null process that takes every state a £ 0(A) to the zero state € E(B). It seems 
reasonable, also, that there exist a canonical trivial sytem I, corresponding to a test space with only 
a single outcome, 1, and a single test {1}. We then have E{I) — E(I)* = M. We can then require 
that, for every normalized state a € V(A), there exist a process R — s- V(A) of preparation, given by 
1 n- a, and, for every outcome x £ A (A), a process V(A) — >• R of registration, sending a £ V(A) 
to The dual process corresponding to the preparation of a is simply the state a itself, while 

the process dual to the registration of x is the linear mapping R — > -E(A) sending 1 to x. All of this 
suggests the following 

Definition 8. A (state-complete) probabilistic theory^ is a category C such that 

(1) Every object A € 6 is a probabilistic model; 

(2) For all A, B £ C, the set G(A, B) of morphisms A — >• is a closed, convex subset of 
L + (i£(A), E(B)), containing the zero mapping, and with t(ua) < ub for all r £ C(A,B); 

(3) There is a distinguished trivial system I with £7(7) = R and X = {1}, such that for every 
A £ 6, C 6(7, A) and Q(A) C C(A, 7). 

(4) The order unit € E(A) belongs to G(I,A). 

From now on, we work in a fixed probabilistic theory 6 of this kind. We write 6* for 
the category having the same objects, but with morphisms G*(A, B) the set of mappings <fi = r* : 
V(.B) -> V(.B) with r e 6(5, A). In effect, 6 and 6* offer, respectively, the "Heisenberg" and the 
"Schrodinger" picture of the same theory. Depending on context, we shall understand the word 
"process" to refer cither to a morphism t £ C(A, B) for some A, B £ 6, or to the dual mapping 
<t> = r* : V(B) -» V(A). 

Example 8. By a standard finite- dimensional quantum theory, we mean a category C of probabilistic 
models (E, X) where E is the hermitian part of a finite-dimensional complex matrix algebra (a 
direct sum of algebras of the form L(5C)), with trace- nonincreasing completely positive mappings as 
morphisms. In this formulation, classical probabilistic theories arise as the degenerate case in which 
all of the matrix algebras associated with systems in C are commutative. 

Reversible and Probabilistically Reversible Processes A process r £ C(A, B) is reversible iff it 
is invertible as a morphism in C, i.e., there exists an inverse process r _1 £ C(B, A) with r _1 or = id^ 
and ror 1 = ids. In this case, r is an order-automorphism E(A) ~ E(B), and t _1 : E(B) ~ E(A) 
is the inverse isomorphism. Moreover, for such a process, we have t(ua) = Ub- by assumption, 
t{ua) < ua, and also t~ 1 (ub) < ua, whence, as r preserves order, ub < t(ua)- Dually, a process 
<f) £ C*(A,B) is reversible iff it has an inverse in G*(B, A); equivalently, 4> is invertible iff the dual 
process t = <j>* is invertible. In this case, we have ws</>(a) = 1 for every normalized state a £ Q,(A). 

There is a weaker but very useful notion, which we shall call probabilistic reversibility. This is 
slightly easier to describe in terms of processes acting on states, rather than effects: 

Definition 9. A process 4> £ C*(A,B), is probabilistically reversible iff it is invertible as a linear 
mapping "V(A) — > V(_B), with a positive inverse and if the inverse mapping </> -1 is a positive multiple 
of a process <fi £ G*(B, A) — say, = c<f> with c > 0. 

Operationally, this means that there is some non-zero probability that <p o <p will return the 
system to its original state. Indeed, 

(f) (<f>(a))(u A ) = c _ V _1 (0(a))(«A) = c -1 o:(ua) = c _1 , 

so this probability is exactly 1/c. In particular, (f> is reversible with probability one iff c = 1, so that 
is a process in C*(B,A) — in other words, 4> is an reversible process. 

9 This definition differs from that of |17| . most obviously in that objects are associated with effect spaces, rather 
than state spaces, but also in taking the test space X(A) to be part of the structure of A £ C. 



12 



We shall say that a process r e G(A,B) is reversible with probability 1/c iff r* e 6* (A, B) is 
reversible. Obviously, the set of probabilistically reversible processes, in either G(A,A) or G* (A, A), 
is a group, containing, but larger than, the group of all reversible processes on A. 

Historical remarks: The representation of what we are calling probabilistic models in terms of an 
order-unit space and its dual goes back at least to the work of Davies and Lewis [27] and Edwards 
[29] . A good survey of the relevant functional analysis can be found in [2] . Test spaces — originally 
called "manuals" — were the basis for a generalized probability theory (and an associated "empirical 
logic") developed in the 1970s and 80s by C. H. Randall and D. J. Foulis and their students. See [71] 
for a survey. Mathematically, of course, a test space is just a hypergraph; the current terminology 
serves only to reinforce the intended probabilistic interpretation. 



3 Composition and Entanglement 

Consider two systems, A and B, which are not interacting in any obvious, causal sense - for example, 
systems occupying space-like separated regions of space-time. In this situation, it seems reasonable 
to assume that what that can be happen to each system idividually — the preparation of a state, 
the making of a measurement, etc. — can happen together, independently. 

Another natural (albeit more contingent) requirement is a no-signaling condition, forbidding the 
transmission of information from A to B, or vice versa, by the mere decision to make one measure- 
ment rather than another on A, or on B. As we'll see, the phenomenon of entanglement, one of the 
supposed hallmarks of quantum theory, is actually a rather generic feature of such "non-signaling" 
composite systems in non-classical probabilistic theories, whether "quantum" or otherwise. (In- 
deed, the phenomenon even arises in otherwise quite classical theories involving a restricted set of 
probability weights.) 

3.1 Composites of Models 

Suppose two parties — Alice and Bob, say — control, respectively, systems A and B, which occur 
as components of some composite system AB, but are still sufficiently isolated to be prepared and 
measured separately. At a very minimum, we would expect Alice's making a measurement, E, on 
here part of the composite system, and Bob's making a measurement, F, on his part, constitutes 
the making of a measurement on the combined system. We would also expect that states of the two 
component systems can be prepared independently Formalizing these requirements, we arrive at 
the following: 

Definition 10. A composite of two probabilistic models A and B is a model AB, together with a 
mapping 

X(A) x Y(B) -> X(AB) : (x, y) i-> xy 

such that 

(i) for all tests £eM and F G B, the product test EF :— {xy\x G E,y e F} belongs to M(AB); 
and 

(ii) for all states a € tl(A) and (3 g SI (B), there exists a unique [?] state a <g> /3 £ tt(AB) with 
(a® 0){xy) = a{x)(3(y). 

Remarks: There are several ways in which we might plausibly weaken this definition. For instance, 
we might require only that the product outcome xy be an effect in E(AB) + , and the set EF, an 
observable, but not necessarily a test, of AB. Such possibilities are worth bearing in mind. How- 
ever, for the purposes of this survey, it seems reasonable to use the more restrictive, but therefore 

10 More radically, one might consider models of systems interacting in such a way that the making of a particular 
measurement, or the preparation of a particular state, on one component, precludes the making of certain measure- 
ments, or the preparation of certain states, on the other component. Mathematically, such situations are certainly 
possible. 
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simpler, definition above. Note in (ii) we require only the existence, but not the uniqueness [??], of 
product states (where a product state for a and (3 is defined as a state 7 with j(xy) — a(x)(3(y), 
and a product state tout court as one that is a product state for a pair of states a and /3). 

The injectivity of the mapping x,y i— > xy in condition (i) allows us to identify X{A) x X(B) with 
the Let us write 

X(A)X(B) := {xy\x£X,y£Y} 

for the square of product outcomes in Z. With a slight abuse of notation, we may write j\l(A) x j\l(B) 
for the test space consisting of product tests EF. Condition (i) asserts that JA{A) x JA{B) is 
contained in M(AB), so every state in f2(AB) restricts to a state uj on the former. Where the 
restricted state ui determines the global state uj — that is, where the set X(A)X(B) of product 
outcomes is state-separating — we say that the composite is locally tomographic. In this setting, 
the joint probabilities of outcomes of measurements on the component systems A and B, completely 
determine the state of the compositeH3 This is a reasonable, but also a rather strong, restriction. 
Indeed, while composites in standard complex QM are locally tomographic, this is not the case for 
real or quaternionic QM. We'll return to this matter below. 

Example 9 (Composite quantum models). If A(JC) and A(K) are two quantum-mechanical models, 
associated with finite-dimensional Hilbert spaces 3"C and K, respectively, let 

A(K)A{K) = A(H®K) 

the model associated with !H €5 K. That is, M(JC <£> K) consists of orthonormal bases for JC <£> K, 
while il(JC ® K) consists of density operators on !H ® K . If x £ !K and y € K are unit vectors, 
then x (g) y is a unit vector in IK ® if. It is easy to check that x,y ^ x ® y makes A(!K ® if) into a 
composite in the sense of the preceding definition. 



3.2 Non-Signaling Composites and Entanglement 

The very broad definition of a composite system given above leaves room for situations in which 
the probability of Bob's obtaining an outcome y will depend on which test E £ j\l(A) Alice chooses 
to measure. This is plausible only in scenarios in which Alice's measurements are able physically 
to disturb Bob's system. If we wish to model composites in which the two systems A and B are 
suffciently isolated from one another that this kind of remote disturbance is ruled out — - the obvious 
situation being one in which A and B are spacelike separated — then we must impose a further 
constraint. 

Definition 11. A probability weight uj on J\il(A) x 7vl(B) is non-signaling iff it has well-defined 
marginal ( or reduced ) states, in the sense that 

u>i(x) := ^2 ^(xy) and Lo 2 (y) ■= ^ w{xy) 

y£F x£E 

are independent of the choice of tests E € 3vl(A), F G ?Vl(B). 

If uj € Vl(AB) is non-signaling, then for every y 6 X{B) and x £ X(A), we can define the 
conditional states Ui\ y and lu 2 \x on A and B, respectively, by 

ujjxy) uj(xy) 
: = — TT and ^xKV) '■= — TT- 

uj 2 {y) wi{x) 

These are well-defined probability weights on M(A) and j\l(B), respectively. It would seem reason- 
able to include them in the state spaces of A and B. Therefore, we adopt the following language: 



11 Barrett 1 191 calls this the global state hypothesis; the term locally tomographic seems to have become more standard. 
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Definition 12. A non-signaling composite of A and B is a composite AB in which all states are 
non-signaling, and all conditional states belong to the designated state spaces of A and B — that 
is, u 2 \x G fi(I?) and u)x\ y G Q(j4) for all x G and y G 

This has a strong consequence [57] : 

Lemma 3 (Bi-Linearization). Let A_B 6e a non-signaling composite of A and B. Then every state 
uj G fl(AB) extends uniquely to a bilinear form on E(A) x E(B). 

Proof: For every x G A(A), define G R X ' B ' by Q(x)(y) — uj(x,y). Notice that ui 2 \ x = 

uj{x)/oj\{x). Since the conditional state u> 2 \ x belongs to Q(B), we have uj(x) G V(£>) = E{B)* , 
with X^es ^K 2 -) = w 2- Dualizing (and remembering that E{A) is finite-dimensional), we have a 
linear mapping uj* : E(B) — !• R X< - A K Now, = u)i\ v /u) 2 {y)\ the latter belongs to 51(A), so 

G V(A) = £(A)* for every y G X(£). Since X(B) spans E(B), it follows that the range 
of uj* lies in V(A), i.e., we can regard uj* as a linear mapping E(A) — > V(£?) = E(B)* . Equiva- 
lently, we have a bilinear form CB w (a, 6) = cD*(6)(a), which evidently satisfies B u (x,y) = uj(xy) for 
all a; G X(A),y G X(_B). Since and X(B) span £(A) and E(B), the form S w is uniquely 

determined by this property. □ 

It follows that, for a non-signaling composite, the mapping X(A) x X(B) — > X(AB) : x, y i-> xy 
gives rise to a linear mapping ® : E(A) (g) -E(-B) — > E(AB), with w(ai <g> y) = 23 w (x,y) = u)(xy) for 
every w G £7(AB)*. The composite Ai3 is locally tomographic iff this mapping is surjective. 

Corollary 1. A non-signaling composite AB of models A and B is locally tomographic iff E(AB) ~ 
E(A) ® that is, dim(E(AB)) = dim(E(A)) dim(E(B)). 

Lemma 3 allow us to extend the definition of conditional states to arbitrary effects, setting 
UJ i\b( a ) — w ( a ® b)/ui(u ® 6) and aj 2 | a (^) = w ( a ® b)/io(a ® u) 

for arbitrary effects a G -E7(A) and 6 G E(B) (with the usual proviso about division by zero). The 
following bipartite version of the law of total probability is easily verified: 

Lemma 4 (Law of Total Probability). Let AB be a non-signaling composite of A and B; let u> be 
any state on AB , and let E and F be any two observables on E(A) and E{B), respectively, then 

^2 = ^2 w i( a ) w 2|a and u>i = 2J u 2 (b)uji\b 

aEE beF 

Corollary 2. Let AB be a non-signaling composite of A and B, and let uj be a pure state of AB. 
If the marginal state ui 2 is pure, then u>i is also pure, and uj = ui\® uj 2 . 

Proof: It is easy to see that, if a product state uj = uj\ ® u> 2 is pure, then both marginals must 
be pure. Now suppose that one marginal state — say, ui 2 — is pure. Since ui 2 — J2xeE UJ i( x ) u, 2\x, 
and the conditional states lj 2 \ x belong to V(jB), it follows that for every x G E with u)\(x) > 0, we 
must have uj 2 \ x = u> 2 , so that ui{xy) = u>i(x)ui 2 {y) for every such x. The same result holds trivially 
if U!\(x) = 0, so we have ui(xy) = uj\{x)u) 2 {y) for all choices of x and y. It follows that ui = ui\ ®uj 2 . □ 

Definition 13. A state ui on AB is separable iff it is a mixture of product states, that is, u> = 
Y] { tiai <g) Pi where > and Xi t, = 1 A state not of this form is said to be entangled. 

Using this language, the preceding Corollary gives us 

Corollary 3. If AB is a non-signaling composite of models A and B, and u> is an entangled state 
of AB, then both ui\ and ui 2 are mixed. 
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This is often regarded as the hallmark of entangled quantum states; but, as we see, it is really a 
quite general possibilty arising in any non-classical probabilistic setting. Of course, one can still ask 
at this point whether entangled states exist in any generality, once one leaves the confines of quan- 
tum theory. However, as we'll see in Section 3.4 below, there is a sense in which most non-signaling 
composites of non-classical models admit entangled states. 

The CHSH Inequality Let AA be a non-signaling composite of two copies of A. For any a, b £ 
E(A) with — ua < a,b < ua, let a' = ua — a and b' — ua — b. For any state w in AA, define 

S(u; a, b) = w(a, b) + uj(a, b') + uj(a , 6) — u{a , b'). 

This is called the CHSH (Clauser-Horn-Shimony-Holt) parameter associated with u, a and b. of 
a bipartite If u) is a product state, then S < 2 for all choices of a and 6; as S is affine in u, it 
follows that S < 2 for all separable states. For entangled states it can be larger. A priori, the upper 
bound for S is 4, and this is achieved, for example, if A is the "square bit" of example 3. However, 
for bipartite quantum states, the upper bound is much lower. As pointed out by Tsirel'son [64], 
S < 2V2 for any quantum bipartite state and any effects a and b. A great deal of work has gone into 
trying to find a deeper explanation for this bound. [31153). In section 4, we will return to this matter. 

Conditioning Maps and Isomorphism States If to is any non-signaling state on AB, then the 
associated bilinear form 23 w on E(A) x E(B) gives us a positive linear mapping 

2 : E(A) -> E(B)* 

defined by 

Q(a)(b) = uj(a ® b) 

for all a £ E(A) and b £ E(B). Notice that cD(a) = wi(a)w 2 | a . Accordingly, we think of cD(a) as an 
un-normalized conditional state of B given the effect a € E(A), and refer to D as the conditioning 
map associated with w. Of course, there is also a conditioning map running in the opposite direction. 
In fact, this is just the adjoint of Q; that is, u)*(b)(a) = uj(a)(b) = uj(a,b) for all effects a £ E(A) 
and b £ E{B). 

There is a dual construction for effects. An effect / £ E(AB) defines a positive bilinear form on 
V(A) x Y(B) by (a, /3) H> /(a ® /3). This, in turn, yields a positive linear mapping 

f:V(A)^V(B)*=E(B) 

given by f(a)(j3) = f{a® ft). We call / the co- conditioning map associated with /. 

Definition 14. Let AB be a non-signaling composite of A and B. An isomorphism state on AB is a 
state e Vl(AB) such that the conditioning map u : E(A) — > V(B) is an order-isomorphism. Dually, 
an isomorphism effect is an effect / € E(AB) such that the co-conditioning map / : V(j4) — > E(B) 
is an order-isomorphism. 

Evidently, the inverse of an isomorphism state is a multiple of an isomorphism effect, and vice 
versa. This point will be important in the discussion of teleportation protocols below. If there exists 
an isomorphism state on a composite AA of A with itself, then we have E(A) ~ V(A) = 
More generally, we shall say that A is weakly self-dual iff there exists an order-isomorhism E(A) ~ 
V(A) (equivalently: an isomorphism state in A (g) max A). Although this is a strong constraint on 
the structure of a probabilistic model, it is nevertheless satisfied by many examples that are neither 
quantum nor classical. For example, the models associated with state spaces that are regular 2- 
dimensional polytopes — that is, regular n-gons — are weakly self-dual. 

As we'll discuss further in Section 5, quantum models satisfy a much stronger form of self-duality: 
not only does there exist an order-isomorphism V(JC) ~ E((K), but this is given by an inner product 
on E(2t) = L(JC), namely, a i-> Tr(a-). 

12 The converse is not quite true: an order-isomorphism E(A) ~ V(j4) defines a non-signaling state on A ® ma x B 
[def.], but need not correspond to a state of AB. 
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Proposition 4 f[14|). Let A and B be irreducible, and let AB be any locally-tomographic, non- 
signaling composite of A with B. Then any isomorphism state in AB is pure in fl(AB), and any 
isomorphism effect is extremal in E(AB)+. 

If A and B are not irreducible, an isomorphism state on AB need not be pure. For example, 
if A = B = (E, A(E)), then any state uniformly correlating A and B — say cj(x, x) — 1/\E\ and 
u>(x,y) = for x ^ y — is an isomorphism state, but will be pure only if \E\ = 1. 

3.3 Quantum Composites 

This is a good place at which to pause for a second and more detailed look at quantum-mechanical 
composites. As noted earlier in Example El the mapping X(IK) x X(K) H > X(IK0 K) given by 
x,y i — y x ® y turns A(^K ® K) into a composite of the models A(!H.) and A(K). This mapping 
extends to the bilinear mapping 

ECK) x E(K) = L h {K) x L h (K) -> L h (IK ® K) = E{"K ® K), 

that sends a, b e L/ l (5C) x L^(Jf ) to the operator a b on IK ® f£T (given by (a 6)(x y) = ax® by 
for all x £ IK, y E K). Hence, by Lemma 3, A(9£<E)K) is a non-signaling product of A(!K) and A(K ). 

Conditioning Let IK be a complex Hilbert space. For any vectors x, y € IK, let x y denote the 
rank-one operator on !K given by (x y)z = (z, y)x. (In Dirac notation, this is \x) (y\.) If a; is a unit 
vector, then x x — P x , the orthogonal projection operator associated with x. 

The mapping x, y i-> x y is sesquilinear, that is, linear in its first, and conjugate linear in its 
second, argument; it therefore extends to a linear mapping !K(g>!K — > L(JC), where IK is the conjugate 
space of IK, taking any vector u = ^i x i ® Ft to the corresponding operator u := t^x, y^. It 
is easy to see that this is injective and hence, on dimensional grounds, an isomorphism. It is useful 
to note that 

(v(x),y) = (u, y0x) 

for all x, y G IK. Hence, if v is any unit vector in IK IK, the corresponding pure state uj = a v of 
A{"K IK) assigns joint probabilities to outcomes x € X(!K) and y 6 X(!K) by 

= |(v,x0y)| 2 = |(w(y),x)| 2 

so that the conditional state u> 2 \y is exactly the pure state associated with the unit vector t;(y)/||t?(y)|| . 
(The fact that conditioning a pure bipartite quantum state by a measurement outcome always leads 
to a pure state — the pure conditioning property — is rather special, and has been exploited in 

EDO 

Purification and Correlation Suppose now that a is a state on A(IK), represented by a density 
operator W on IK with spectral resolution 

W= Y1 XxPx = J2^ x ® x 

xeE xeE 

where E is an orthonormal basis for IK and J2xee ^ = ^[W) — 1. Functional calculus gives us 
W 1 ! 2 — J2xeE ^x 2 x x. We can interpret this as a unit vector in IK IK, namely 

% :=^Ay 2 i®5. (1) 

xeE 

This, in turn, defines a bipartite state on the composite quantum system AA := A(IK IK). The 
marginal, or reduced, state of the first component system is given by 

wi(a) = Tr(P 9w (a® %) = ((a *w) = Tr(Wa) 
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so the pure state corresponding to is a dilation of the given mixed state W. Now observe that 
if u, v € with lilt/, then we have 

a:G_E 

Evidently, the pure state U) corresponding to sets up a perfect correlation between E G M([K) 
and the corresponding test E — {x\x £ E} £ M(3{), with 

w(x,x) = \(^ w ,x(g)x)\ 2 = \Xl /2 \ 2 = X x . 

An especially interesting case arises when a is the maximally mixed state, i.e., when W — 1/n 
(where n = dim(J£)). Then is independent of the choice of E (since every orthonormal basis 
of 3~C is an eigenbasis for 1). Hence, simultaneously correlates every test E £ M(!H) with its 
counterpart in 3VC(!H). Moreover, the correlation is uniform, in that the probabilities of correlated 
pairs x <S> x of outcomes is uniformly 1/n. As we'll see later, the existence of such a uniformly 
correlating state between two isomorphic systems has interesting consequences. 

Local Tomography If IK and K are real or complex Hilbert spaces of dimensions m and n, 
respectively, As was remarked above, A(!K(S> K) is a non-signaling composite of A(Ji) and A(K). 
It is easily checked that AimE(A) = dim£/j(3-C) = m 2 if 3-C is complex and (m 2 + m)/2 if !K 
is real. Hence, the dimension of the real vector space E(Jt ® K) = £h(IK eg) K) of Hermitian 
operators is (ran) 2 = m 2 n 2 , so in fact £fj(!K<g) If) = £/j(!K) <E> Lh(K), and the composite system 
is locally tomographic. On the other hand, if JC and K are real, the dimension of £/j(!K <g) If) is 
((mn) 2 — mn)/2 + mn — ((mn) 2 +mn)/2, while the product of the dimensions of L^^K) and Lh(K) 
is 

(m 2 + m) (n 2 + n) m 2 n 2 + m 2 n + mn 2 + mn 
2 2 _ 4^ ' 

This is strictly less than (m 2 n 2 +mn)/2, which in turn is less then (mn) 2 , so in this case, E(AB) is 
strictly larger than E(A) <g) E(B). Thus, for real Hilbert spaces !K and K, the standard composite 
M(3"C <g> K) is not locally tomographic. (Neither do we have local tomography for quaternionic 
Hilbert spaces, though here, one needs to be more careful about the formulation of the relevant 
tensor products. See [6] and [44] for more details.) 

3.4 Maximal and Minimal Tensor Products 

Let AB be a non-signaling composite of two systems A and B. As noted above, if AB is locally 
tomographic, then E(AB) ~ E(A) <Ei E(B) as vector spaces. In this section, we consider more 
closely the possibilities for such a composite. 

As we saw earlier, any non-signaling state uj on any composite system AB is associated with a 
bilinear form on E(A) x E(B). If AB is locally tomographic, then we can identify uj with this form. 
We then see that there are two extreme possibilities for the set of states on a locally tomographic 
composite AB: maximally, we may include all positive, normalized bilinear forms on E(A) x E(B); 
minimally, we may restrict our attention to the closed convex hull of the product states. 

Definition 15. Let E and F be any two finite-dimensional ordered vector spaces. The minimal 
tensor cone on E <g) F is the cone generated by pure tensors a <£> b with a € E + and b € F + . The 
maximal tensor cone is the cone of all tensors r <G E <g> F such that t(oj) > for all u) € S+(E, F). 
These two cones give us two different ordered tensor products, which we denote by E <g) m i n F and 
E <8>max F, respectively. 

It is not difficult to see that (in finite dimensions) we have 

(E (g> min F)* = E* ® max F* and (E ® max F)* = E* ® min F*. 

Let AB be any locally tomographic composite of models A and B. Then the set X(A)X(B) 
of product outcomes in E(AB) ~ E(A) Cgi E(B) generates exactly the minimal tensor cone in 
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E(A) ® E{B). It follows that the cone of un-normalized non-signaling states on M(yl) x Jvl(B) 
[defined?] is exactly the maximal tensor cone in V(A) <g) V{B). Dually, the set of product states 
generates the minimal tensor cone in V(A) <g) V(B). 

Definition 16. Thus, we may define the minimal tensor product of A and B to be the model 
A <8>min B = (E(A) <8> m i n E(B),X(A) x X(B)). By the maximal tensor product of A and B we 
mean the model (E(A) <g) max E(B), X(A) <g) max X(B)), where the test space X{A) <8> max X(B) is the 
maximal test space for E(A) <g> max E(B). 

These choices of these two test spaces are dictated by the desire to have the following 

Proposition 5. If AB is any locally tomographic composite of A and B, then we have embeddings 
A ® mm B — s- AB — !• A ® max B. We also have, dually, Q(A (3 max B) < Vt(AB) < Q(A ® min B). 

Thus, A <g> m in B is the smallest possible locally tomographic composite of A and B, in the sense 
of having the fewest possible effects. Dually, E(A £§> m in B)* = E(A)* <8) max E(B)* has the largest 
possible state space among locally tomographic composites. One might say, roughly speaking, that 
A (g>min B admits no entanglement between effects, and, consequently, admits all possible entangled 
states. At the other extreme, A <g> max B admits every possible entangled bipartite effect and, in 
consequence, admits no entanglement of states. 

If Q(A) or tt(B) is a simplex, then it is easy to show that V(A) (g) max V(S) ~ V(A) (g> min V(B) 
and E(A) <8> max E(B) ~ E(A) <S>min E(B). Thus, a classical system admits no entangled states or 
effects in any non-signaling composite with another system. There is a partial converse: 

Theorem 6 ([52 ). The following are equivalent: 

(a) VL(A® max B) contains no entangled state for any model B, 

(b) VL(A® max B) contains no entangled state, where B is the square bit (Example....), 

(c) n(A) is a simplex. 

It follows that any non-classical system A — one with a non-simplicial state space - will admit 
some locally tomographic, non-signaling composite AB that admits entangled states. In this sense, 
entanglement is a highly generic phenomenon in non-classical probability theory. 

3.5 Monoidal Probabilistic Theories 

Earlier, we decided to represent a probabilistic theory as a category of probabilistic models with 
positive mappings as morphisms. It is not unreasonable to require that, if A, B and C are three 
systems, we should be able to form tripartite composites (AB)C and A(BC). We'd perhaps like to 
require that these be the same, i.e., that we have an associative rule of composition. This is not a 
trivial requirement — one can readily imagine situations in which the composition of systems might 
not be associativJ^l — but it is a natural one. 

A symmetric monoidal category is a category C, equipped with a bi-functor C x C — > 6, such 
that for all A, B,C,D e 6, 

A ® (B ® C) ~ (A ® B) ® C and A <g> B ~ B ® A 

by means of natural isomorphsism ola-,bc and <?a,b belonging to 6; and also equipped with a tensor 
unit, I, and natural isomorphisms 

I <E> A~ A~ A® I 

13 Consider, for instance, the case of 

(Farmer <g> Hen) g) Fox vs. Farmer ® (Hen $5 Fox) . 
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This point of view has been extensively developed in the the categorical semantics for quantum 
theory developed by Abramsky-Coecke and Selinger [TJ EH [59] , and also in the work of Baez and his 
students [7]. 

Definition 17. A monoidal probabilistic theory is a probabilistic theory C, equipped with a rule 
of composition A, B i— > AB assigning, to each pair of models A, B 6 6, a composite AB in the 
sense of Definition \1(A and making 6 a symmetric monoidal category. We shall say that C is non- 
signaling, respectively locally tomographic, iff AB is non-signaling or locally tomographic for every 
pair A,BeG. 

This definition implies that, for all A,B 6 C and all states a E £l(A), f3 E fl(-B), there is a 
distinguished product state a <E) (3 with (a ® f3)(xy) = a(x)(3(y) for all x E X (A), y E X{B). 
Similarly, for any (dual) processes t\ E C(A) and t 2 E 6(B), there exists a process (ry®r-x) E G(AB) 
with (n 8) r 2 )(a (8 6) = n(o) ® t 2 (6) for all effects a E E{A) and b E E(B). 

Finite-dimensional classical and quantum probability theory are both monoidal with respect to 
their usual rules of composition. The minimal and maximal tensor products are each naturally 
associative, and hence make the category of all probabilistic models into a monoidal probablistic 
theory; but neither is entirely satisfactory: the former provides for entangled states, but does not 
permit entangled effects, while the latter provides for entanglement between effects, but allows none 
between states. That a probabilistic theory support a single "tensor product" that accommodates 
entanglement of both states and effects, is a non-trivial constraint. To be sure, one might consider 
probabilistic theories equipped with more than one rule of composition; however, the interactions 
among different non-signaling compositions on a given theory can be very delicate. It therefore 
seems reasonable to begin by investigating the simpler possibilities for a theory equipped with a 
single privileged, monoidal rule of composition. Accordingly, in the balance of this paper, we 
work in a monoidal probabilistic theory C. 

Historical Remarks Tensor products of compact convex sets or of order-unit spaces were studied 
in a number of papers in the late 1960s, notably that of Namioka and Phelps [52 . The fact that the 
marginal of an entangled pure state must be a mixed state already appears there, albeit not in these 
terms, as do the definitions of what we are calling the maximal and minimal tensor products. Our 
treatment composite systems derives from that of by Foulis and Randall 34, 44 . Some first attempts 
to understand probabilistic theories as symmetric monoidal categories of probabilistic models can 
be found in |17[ 115]: work in this direction is ongoing. 



4 Post- Classical Information Processing 

As we've seen, entangled bipartite states and effects arise very naturally, not only in quantum theory, 
but in almost any context in which we form non-signaling composites of non-classical systems. 
While this observation goes back at least to [HI 23J in the late 1980s, it remained unexploited. 
Entanglement lies at the heart of quantum information theory, so it natural to wonder to what 
extent quantum information-theoretic results carry over to other non-classical settings. It turns 
out that a great many such results do have analogues for probabilistic theories that are far more 
general than quantum mechanics. While the exploration of this post-classical information theory is 
still in its infancy, it has already shed considerable light on the scope and meaning of several key 
quantum-informational results. 

In this section, we review in some detail two of these. The first is the no-cloning theorem, and 
its generalization, the no-broadcasting theorem. These hold in any finite-dimensional theory having 
a state space that is not a simplex. The second is the existence of a teleportation protocol, or, 
a bit more generally, of an entanglement-swapping protocol. Here, some restrictions need to be 
made, but they are of moderate strength. For example, any monoidal probabilistic theory in which 
individual systems are weakly self- dual, and composites include isomorphism states uj and effects 
/ corresponding to isomorphisms u), f witnessing the weak self-duality, supports a certain kind of 
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teleportation. Moreover, when viewed in this generality, teleportation loses most of its mystery: it 
is simply a form of classical conditioning, one which appears startling only owing to the appearance 
of isomorphism states. 



4.1 Cloning and broadcasting 

To clone a state of a system A means, very broadly, to produce two independent copies of that state 
by means of some physical process. In the present formalism, if the initial state belongs to a system 
A, this would require a positive linear mapping 



such that 4>{ct) = a <g> a. There is no difficulty producing such a mapping: indeed, the constant 
mapping fl(A) — > fl(AA) given by (3 i— > a for all /? S £l(A) is affine, and hence, extends uniquely to a 
positive linear mapping 'V(A) — > \(AA). However, this mapping is (highly!) state-dependent. One 
might ask whether one could jointly clone a collection of states, say, cti,...,a n . That is: given such 
a set of states, can one find a single, norm-nonincreasing, positive linear mapping V(A) — > ~V(AA) 
that clones them all, in the sense that (j>(oti) = cti® cti for all it 

If the states cti are jointly distinguishable, the answer is yes. If {a^} is an observable on A with 
OiiicLi) = 1; then the mapping 



does the trick. The no-cloning theorem is essentially the converse: if there exists a single process 
that will clone all of the states ct\, ...,a n , then there exists an observable that distinguishes them. 
In the case of a discrete classical model, where all pure states are jointly distinguishable, this is 
no restriction on the clonability of pure states; but quantum pure states, which are not jointly 
distinguishable, are in general not jointly clonable. 

The quantum no-cloning theorem was first proved, independently, by Wootters and Zurek |73] 
and by Dieks [25] . That the same result holds for arbitrary probabilistic theories is proved in [9] . We 
omit the proof here, but the idea is simple: if we can clone each of the states a%, a n with a single 
mapping, then by iterating this process, we can create arbitrarily large ensembles of independent 
copies of an unknown state a € {cti, a n } and, by making measurements on this ensemble, we can 
use statistics to distinguish among them. 

We say that a state p € SI is broadcast by an affine mapping cf> : Q —¥ SI <8> SI iff the bipartite 
state <p(p) has marginal states 4>{p)i an d 4 > i.P)i both equal to p. If p can be expressed as a mixture 
of distinguishable — hence, clonable — states a%, ...,a n , say p — Y], tjCXi, then one can broadcast 
p using a cloning map </> for the states ai, a n : the state <fi(p) = J2i ti a i ® a i has both marginal 
states equal to p, as required. The quantum no-broadcasting theorem of Barnum et al. 8^ tells us 
that, conversely, two quantum states are jointly broadcastable iff, regarded as density operators, 
they commute — which, by the Spectral Theorem, is equivalent to requiring that all are convex 
combinations of some single set of distinguishable pure states. In fact, this is a corollary of a more 
general result: 

Theorem 7 ([SJ IIP)). Let T be the set of states broadcast by an affine mapping <j) : Q —¥ Q (g) f2. 
Then T is the simplex generated by a set of distinguishable states in SI, which are cloned by (f>. 

(Although we omit the proof here, it is not especially difficult. This is in contrast to earlier 
proofs of the quantum no-broadcasting result [81 145) . which were not especially easy.) 

4.2 Remote Evaluation 

Suppose 6 is a locally tomographic, monoidal probablilistic theory. Consider two parties, Alice 
and Bob, occupying arbitrarily distant sites. Suppose that Alice controls a pair of systems, say 
A a ,Ai £ C, while Bob controls a system B E 6. Since 6 is monoidal, we can represent Alice's two 
systems together as a single bipartite system A = A A\, and the entire Alice-Bob system, by the 
tripartite composite AB = (A Q Ai)B ~ A (AiB). 



4> : V(A) -> V(AA) 
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Now suppose that the composite system A\B is in a state oj, while Alice's system A a is in a 
state a, independent of the A\B sub-system. Then the total state of the system AB = A a (AiB) is 
a ® oj. Now let Alice make a measurement on her system A = A A\, obtaining a result represented 
by an effect / G E(A); suppose Bob also makes a measurement on his system, B, obtaining a result 
represented by an effect b G E(B), so that the joint outcome of these two measurements is / ® b. 

Lemma 5 (Remote Evaluation). With notation as above, let lj : E(Ai) — > V(B) and f : V(A Q ) — > 
E(A 1 ) be the conditioning and co- conditioning maps associated with the state u> and the effect f. 
Then, for all a G V(A Q ) and all b G E(B), 

(a®w)(/®&) = /(w(a))(&). (2) 

The proof is easy: one simply checks that the formula is correct when a; is a product state and / 
is a product effect. Since we are working with locally tomographic composites, product states and 
effects span E(AiB)* and E(A Q Ai), respectively, so (2) holds for all choices of oj and /. Nevertheless, 
the result is somewhat surprising, for it asserts that the mapping 

T~Qof: V(A ) -> V(B) 

can be implemented, probabilistically, by means of a preparation of A\B in the joint state oj and 
a (successful) observation of / on A Q A\. In particular, when Alice observes the effect /, the corre- 
sponding un-normalized conditional state of Bob's system is 

(a ® u)(f ® -) = r(a). 

Note that the probability of the process t occurring in state a is ub(t(o)), which is is exactly the 
marginal probability (a (g> w)i(/) of Alice's obtaining /. In what follows, we refer to the pair (/, oj) 
as a remote evolution protocol for the process t = / o Q. 

We can reformulate the notion of conditioning and co-conditioning map, and the remote evalu- 
ation Lemma (Lemma 5), in purely categorical terms. In fact, both make sense in any symmetric 
monoidal category 6. Given objects A, B G C and a morphism ui : A ® B — > I, there is a canonical 
mapping Q : Q(I, A) — > Q(B, I) given by 




Dually, if / g 6(7, A ® B), there is a natural mapping / : G(A, I) -> G(I, B) given by 




If C is a monoidal probabilistic theory, then u) and /, defined in this way, correspond exactly to the 
conditioning and co-conditioning maps associated with the bipartite state co : A <g> B — > I and effect 
/ : I — > A® B. Combining diagrams (3) and (4), and taking advantage of the monoidal structure 
of 6 — in particular, the fact that a ® oj = (I ® co) o (a ® \c\a ^a 1 ) — we have 

w(/(a) ® id B ) = wo(ct» id^^) ° (/ ° ids) = (a ® oj) o (/ ® id B ) (5) 
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which precisely expresses Lemma 5. 

A a ® A x <g> B (6) 




This has an important corollary. Since U) o (a <g> idA 1 B) — a <S> (idA D ° k>), we can re- write (6) as 

w(/(a) <g> ids) = a o (id^,, ® w) ° (/ <S> ids) 

Thus, the dual process r* : E(B) — > -E(A) corresponding to the process t — uj o f arising in the 
remote evaluation protocol, is in fact a morphism in C(A 0) _B). 

Conclusive Teleportation In the special case in which the models A ,A± and B are isomor- 
phic and weakly self-dual, we can consider a remote evaluation protocol in which both the effect 
/ £ E(A) and the state ui £ fl(AxB) correspond to order isomorphisms / : Y{E a ) ~ E{A\) and 
uj : E(Ax) ~ ~V(B). In this case, the process r = Do/ is again an order-isomorphism. If this scenario 
is repeated many times, Bob can perform sufficiently many measurements to determine r(a) with 
reasonable confidence, and then compute the value of a. On the other hand, if r is probabilistically 
reversible, in a single run of the scenario Bob can actually correct his state, with non-zero proba- 
bility, so that it agrees with a. In this case, we may say that the state a has been teleported from 
Alice's system A Q to Bob's system B, and refer to (/, uS] as a teleportation protocol. If r is reversible 
with probability 1, we shall say that (/, uS) is a strong teleportation protocol. 

Deterministic Teleportation Suppose now that Alice has access to an observable {fi} on A = 
A Q Ai, with each of the effects fi an isomorphism effect. Each of these effects, in combination 
with the isomorphism state uj, gives rise to a conclusive teleportation protocol, implementing the 
order-isomorphism Tj = uj o : ~V(A ) ~ S{B). If Alice is permitted to communicate (classically) 
with Bob, then upon observing outcome fi, she can instruct Bob to implement the inverse process 
r _1 , which he can do with probability c, := usT~ 1 (a). It follows that the post- measurement state 
of Bob's system will be 'J2 i eta — a. particular, £\ a = 'J2 i UBT[~ 1 (a). Say that A supports a 
deterministic teleportation protocol iff there exists such an observable {/,} and such a state uj. 

Theorem 8 Qllj). Suppose there exist a finite group G acting transitively on A's pure states, and 
a G-equivariant order-isomorphism E{A) ~ E(A)* . Then A supports a deterministic teleportation 
protocol. 

Entanglement Swapping Suppose that, like Alice, Bob controls a bipartite system B = £>i£>2- 
Assume here that A Q , Ax, B\ and Bq are all isomorphic to one another. Given an entangled state ui 
between Ax and B\, and isomorphism effects / on A — A a Ax and g on B = B\Bq, we find that, for 
any state /i on AqAq, we have (up to the obvious symmetrizers and associators) 

(/ ® sOG" ® w) = s(£ ° / ° #*)• 

Since this holds for any choice of g € E(B), we have 

(/i(g)U>) B |j =£0/0(1' 

If t = 2 o / is probabilistically reversible, then upon Bob's executing the reverse process, the state 
fi has been transferred from A B 2 to B = BxB . 
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Teleportation and Compact Closure Let C be any symmetric monoidal category. A dual for an 
object A E G is an object B G G, together with two moronisms, r\ : I — > B <£> A and e : A® B ^ I 
— called the unit and co-unit, respectively — such that 

(e ® i&a) ° (idA ® rj) — id^ and (ids <g> e) o (rj ® id^) = id B (7) 

In view of the discussion above, if C is a monoidal probabilistic theory and /, u) is a conclusive 
teleportation protocol for a pair of systems A, B £ G, then the remote evaluation lemma tells us 
that / and u> function as a unit and co-unit, respectively, for A and B. A symmetric monoidal 
category in which every object has a dual is said to be compact closed. A compact structure on a 
compact closed category is a specification, for every object A 6 C, of a distinguished dual A' € C. 
Where A — A' for every A G 6, this structure is degenerated 

Proposition 9 f[15j). -Let G be a monoidal probabilstic theory. The following are equivalent. 

(a) G admits a compact closed structure. 

(b) Every A G C can be teleported through some B G C; 

(cj Every morphism in G has the form Qo f for some bipartite state uj and bipartite effect f in G. 

Proof: The equivalence of (a) and (b) is clear from the preceding discussion. To see that these are 
in turn equivalent to (c), suppose first that (a) and (b) hold. Choose for each A G G a dual system 

A', a state lj a G G{A ® A', I), and an effect f A € G(I,A' ® A) with = /a • Then for any 
morphism r G C(A B). let / T G 6(7, A® B) be the effect f A ° (A'®r). It is easily checked that then 
/ T = t o f A , so that t = f T o CJa. Conversely, if (c) holds, then for each A, the identity mapping id a 
factors as uja ° Ja f° r some G G(B ® A, J) and some f & A® B. It follows that o)^ = /J , so 
this gives us a compact closed structure. □ 

4.3 Steering 

Let B be a probabilistic model. An ensemble for a state (3 G 17(5) is a finite set of of states 
ft £ V(B) + such that £\ ft = /3. We can understand such an ensemble as representing one possible 
way of preparing the state /3, namely, to choose one of the normalized states ft := ft/u(ft) with 
probability = u B (ft)- 

One way to do this is to begin with a bipartite state w on a non-signaling composite AB, with 
marginal ui2 — ft Then for any observable E — {a{\ on A, the un-normalized conditional states 
ft := uj(a,i) are an ensemble for ft That is: by measuring E, we prepare not only the marginal 
state ujb, but a particular ensemble for this state. By choosing to measure a different observable, 
we will typically obtain a different ensemble for ft If A and B are quantum systems, and if u; is a 
pure entangled state of AB, then any ensemble for L02 can be obtained in this way from a suitable 
choice of measurement on A. This phenomenon was first observed by Schrodinger [58], who called 
it steering. The concept extends readily to the setting of an arbitrary non-signaling composite. 

Definition 18. Let AB be a non-signaling composite of probabilistic models A and B. A bipartite 
state u> G AB is steering for its B marginal, or B-steering, for short, iff, for every ensemble (convex 
decomposition) U2 = "Yin Pi, where ft are un-normalized states of B, there exists an observable 
E — {ai} on A with ft = Q(a,i). We say that u) is bi-steering iff it's steering for both marginals. 

The relevance of steering to information processing became evident when Bennett and Brassard 
[?], in the same paper that introduced quantum key distribution, considered a natural quantum 
scheme for another important cryptographic primitive, bit commitment, and showed that ensemble 

14 Duals, where they exist, are canonically isomorphic. Hence, for most purposes, the choice of one rather than 
another object as "the" dual is irrelevant. The existence of a degenerate compact structure is, however, a real 
constraint 15 , 60|. 
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steering can be used to break it. In the proposed scheme, the two possible values to which Alice 
can commit are represented by two distinct ensembles for the same density matrix. She is to send 
samples from the ensemble to Bob in order to commit, and later reveal which states she drew so 
that Bob can check that she used the claimed ensemble. However, by sending to Bob, not a draw 
from the ensemble, but one of two systems in an entangled pure bipartite state with the specified 
density matrix as its marginal. Keeping the other system, she can realize either ensemble after she 
has already sent the systems to Bob by making measurements on her entangled system, enabling 
her to perfectly mimic commitment to either bit. 

Later Mayers, and Lo and Chau, showed that no information-theoretically secure quantum bit 
commitment protocol can exist. The techniques they used to defeat putative protocols do not 
literally use steering, but are closely related to the Bennett-Brassard steering attack, in particular in 
Alice's retention of a system purifiying the systems she sends to Bob in the course of the protocol. 

The paper [14] studies steering in the context of general probabilistic theories. If a is any state 
on A and j3 is a pure state on B, then lo — a <S> (3 is trivially steering for L02 = (3 since the latter 
has no non-trivial ensembles. In particular, any pure product state will be steering for both of its 
marginals. Any isomorphism state lo G V(AB) will also be steering. 

It follows almost immediately from the definition, that if lo is steering for its i?-marginal, then 
the image, lo(E(A) + ), of the positive cone in E(A), is a face of V(B) + . Indeed, we have 

Lemma 6. If lo is steering, then Q(E(A) + ) = Face(w2). 

Here Face(u>2) refers to the face generated by L02, i.e, the smallest face of V(J5)+ containing L02. 
The converse of Lemma (6) is false. 

A probabilistic theory C supports uniform universal steering if, for every system B G 6, there 
exists a system As G C such that every state f3 G A is the marginal of some -B-steering state 
lo G AbB. If one can always take As = A, we say that C supports universal self-steering. 

Proposition 10. let lo G fl(AB) be steering for L02, where L02 is interior to V(£?) + , so that 
Face(oj2) = V(i?)+. If lo is infective (non- singular), then lo is an order isomorphism. IfY(B) is 
irreducible, therefore, by Proposition^ uj it is pure. 

In other words, if A and B have the same dimension, then the states that are steering for an 
interior marginal are precisely the isomorphism states (and hence, are steering for both marginals). 
Steering is closely related to an important property of quantum theory called homogeneity. 

Definition 19. Let S be a group of order-automorphisms of an ordered vector space E. We say 
that E is homogeneous with respect to S if S acts transitively on the interior of the positive cone 
E + . That is, for every pair of interior points a, b of E + , there exists an element g G S with ga = b. 
We say E is homogeneous if it is homogeneous with respect to some group of order-automorphisms, 
or, equivalently, if it is homogeneous with respect to the group Aut(E) of all order-automorphisms. 

It can be shown that the cone L + (IK) of positive operators on a finite-dimensional Hilbert space 
3-C is homogeneous with respect to the group of order- automorphisms of L(5£). As we discuss below 
in Section 5, the combination of homogeneity and strong self-duality comes close to characteriz- 
ing finite-dimensional quantum theory among probabilistic theories generally. More precisely, the 
Koecher- Vinberg Theorem asserts that if E is an ordered linear space whose positive cone E + is 
both homogeneous and self-dual, then E can be given the structure of a euclidean Jordan algebra. 
With this in mind, the following result is particularly intriguing: 

Theorem 11. For a model with irreducible state space V(A) the following are equivalent: 

(a) A is homogeneous; 

(b) Every normalized state in the interior of Q(A) is the A-marginal of an isomorphism state in 
B® m axA, where B is any (fixed) model with state space order-isomorphic to V(A)*. 

From this we obtain: 

Corollary 12. For any model with irreducible state space A, the following are equivalent: 
(a) V(A) + is weakly self-dual and homogeneous; 
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(b) Every normalized state in the interior of fl(A) is the marginal of an isomorphism state in 

A ®max A. 

Corollary I10[ combined with Theorem I 111 gives 

Proposition 13. In any theory that supports universal uniform steering, every irreducible, finite- 
dimensional state space in the theory is homogeneous. 

In light of Corollary 021 we also have 

Proposition 14. In any theory that supports universal self-steering, every irreducible, finite- dimensional 
state space in the theory is homogeneous and weakly self- dual. 

Therefore, the distance between probabilistic theories allowing universal self-steering, and those 
whose state-spaces are Jordan-algebraic is just that between weak and strong self-duality. 

In [13j it was shown that an asymptotically exponentially secure bit commitment protocol, based 
(like the original Bennett-Brassard one-qubit protocol) on the nonuniqueness of convex decompo- 
sition in nonclassical state spaces, exists in any theory containing some nonclassical state spaces, 
coupled only by the minimal tensor product (so that there is no entanglement between them). In 
a nonclassical theory in which all states can be steered, by contrast, this type of bit commitment 
protocol can always be defeated. 



4.4 Entropy and Information Causality 

Classical information theory begins with the Gibbs-Shannon entropy H(p) = — X)jPil°s(Pi) °f a 
discrete probability weight pi,...,p n . Analogously, in quantum theory the von Neumann entropy 
of the state corresponding to a density operator p is given by S(p) :— Trplogp. This is related to 
the classical Gibbs-Shannon entropy in two important ways. On one hand, S(p) is the minimum of 
the Gibbs-Shannon entropies — ^°SPi °f the probability weights pi = Tr(pe^) that p induces 
on quantum tests {e^}. (This turns out to be achieved when the measurement is in a diagonalizing 
basis). Alternatively, S(p) is the minimum Gibbs-Shannon entropy of the probabilities pi arising in 
representations of p as a mixture p = *^2 t PiPi of pure states pi. (This again turns out to be achieved 
for an ensemble whose states are the rank-one projectors corresponding to a diagonalizing basis). 

Both of these characterizations make sense in the context of an arbitrary probabilistic model, 
but in general, they are not equivalent. 

Definition 20. Let a be a state on A. For each test E G DVC(-A), define the local measurement 
entropy of a at E, He{oi), to be the classical (Shannon) entropy of ol\e, i.e., 

H E (a) := - 2J a(x)log(a(a;)). 
xeE 

The measurement entropy of a, H(a), is the infimum of He (a) as E ranges over Jvl(A), i.e., 

H(a) := inf H E (a). 

EeM(A) 

Note that the measurement entropy of a state a € Q,(A) depends entirely on the structure of the 
test space M.(A), and not on the geometry of the state space f2. 

We shall assume in what follows that the measurement entropy of a state is actually achieved 
on some test, i.e., that H(a) = He(o) for some E e 3Vl(A). This is the case in quantum theory, 
and can be shown to hold much more generally, given some rather weak analytic requirements on 
the model A ([12:, Appendix B.) It follows that H(a) = if and only if there is a test such that a 
assigns probability 1 to one of its outcomes. 

Notation: It will often be convenient to write H(a) as H(A), where context makes clear which 
state is being considered. If AB is a non-signaling composite, and H(AB) reprents H(u>), we shall 
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write H(A) and H{B) for the marginal entropies H{u>\) and H{u>2). It is easily checked that the 
measurement entropy is subadditive, i.e., 

H(AB) < H(A)+H(B). 

Definition 21. Let a be a state on A. The mixing (or preparation^) entropy for a, denoted S(a), 
is the infimum of the classical (Shannon) entropy H(jpi, ■ ■■,p n ) over all finite convex decompositions 
a = ^2 i PiCti with cti pure states in Q(A). 

Again, we write S(A) for S(a) where a belongs to the state space f2 of a system A = (M, £1). 
In contrast to measurement entropy, the mixing entropy of a state depends only on the geometry 
of the state space f2, and is independent of the choice of test space M(A). The mixing entropy is 
essentially the same as the entropy defined for elements of compact convex sets by A. Uhlmann in 

:•'!• 

We call a theory monoentropic if mixing entropy equals measurement entropy, for every state 
of every model in the theory. Appendix B of [12] considers some implications of monoentropicity. 
For instance, it is shown that any monoentropic model A in which the set of pure states is closed in 
Q,(A) is sharp. 

We define conditional and mutual information in terms of measurement entropy via formulas 
that also hold classically: 

Definition 22. The conditional measurement entropy between A and B is defined to be 

H{A\B) := H(AB) - H(B). (8) 

The [measurement-based] mutual information is defined to be: 

I{A:B):=H{A) + H{B)-H{AB). (9) 

Intuitively, one might expect that I(A : B) should not decrease if we recognize that B is a part 
of some larger composite system BC - i.e., we might expect that I {A : B) < I (A : BC). Simple 
algebraic manipulations (using Eqs. (JSJ) and ([9])) allow us to reformulate this condition in various 
ways. 

Lemma 7. The following are equivalent: 

(a) I {A : BC) > I(A : B) 

(b) H(A\BC) < H(A\B) 

(c) H(AB) + H{BC) - H(B) < H(ABC) 

(d) I(A : B\C) > 0, where I(A : B\C) = H{A\C) + H{B\C) - H{AB\C). 

The measurement entropy is said to be strongly subadditive if it satisfies the equivalent conditions 
(a)-(d). (Condition (c) is what is usually termed "strong subadditivity" (SSA).) A probabilistic the- 
ory in which conditions (a)-(d) are satisfied for all systems A, B and C will also be called strongly 
subadditive. Despite the intution mentioned above, strong subadditivity can fail in general theories, 
which is perhaps a signal that mutual information as defined above should not be interpreted in 
general as "the information each system contains about the other" . 

The Holevo Bound and the Data Processing Inequality The strong subadditivity inequality is 
crucial to deriving bounds on many quantum information-transmission protocols, and the conditions 
under which it is satisfied with equality are also of great importance. Another extremely important 
inequality - derivable, in the quantum setting, from strong subadditivity - is the Holevo bound, 
which figures in an expresssion for the highest achievable rate of classical information transmission 
through a noisy quantum channel. 
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The standard formulation of the Holevo bound can apply to a general theory, if the entropies are 
interpreted as measurement entropies: it asserts that if Alice prepares a state p = J2x£eP x P x f° r 
Bob, then, for any measurement F that Bob can make on his system, 

I(E : F) < x, 

where \ := H(p) — ^2 xe EPxH(p x ) (often called the Holevo quantity). 

Suppose that Alice has a classical system A = ({E}, A(2?)) and Bob a general system B. Alice's 
system is to serve as a record of which state of B she prepared. The situation above is modeled by the 
joint state uj ab — ^2 xGE p x 5x®f3 x , where 5 X is a deterministic state of Alice's system with 8 x {x) = 1. 
Bob's marginal state is u>2 = J2x£ePxPx- By Lemma ??, H{lo ab ) = H(A) + ^ x£E p x H(f3 x ). Hence, 

i(a:b) = h(a)+h(b)-h(ab) 

= h(a) + h(b)-(h(a) + J2p*h(!3 x )) 

V x£E ) 

= H{u B )-Y J P*H{p x )=X- 

x£E 

So the content of the Holevo bound is simply that the mutual information between the measurement 
of Alice's classical system and any measurement on Bob's system is no greater than I (A : B), 

I{E : F) < I (A : B). 

(While this is certainly natural, in general theories it does not always hold.) 

Both strong subadditivity and the Holevo bound are instances of a more basic principle. The 
data processing inequality (DPI) asserts that, for any systems A, B and C, and any physical process 
£ : B -> C, 

I(A : £(£)) < I(A : B) 

where I(A : £(£>)) refers to the mutual information of the state resulting from applying i<1a ® £) 
to the state of AB. The strong subadditivity of entropy amounts to the DPI for the process that 
simply discards a system (the marginalization map BC — > C). The Holevo bound is the DPI for the 
special case of measurements, which can be understood as processes taking a system into a classical 
system which records the outcome. 

Information Causality In a widely discussed paper [53], , M. Pawlowski et al. introduced a 
constraint on a non-signaling probabilistic theory, which they called information causality, in terms 
of the following protocol. Two parties, Alice and Bob, share a joint non-signaling state, known to 
both of them. Alice receives a random bit string e of length N; after making measurements, she 
sends Bob message, /, a bit-string of length length m or less. Bob receives a radom variable G, 
encoding a number, k = 1, N, which he takes as the instruction to measure Alice's fc-th bit. After 
making a suitable measurement, and taking into account both its outcome and Alice's message, Bob 
produces his guess, bk- Information causality is the requirement that 

N 

^2l(e k :b k \G = k)<m. (10) 
fc=i 

The main result of [S3] is that if a theory contains states that violate the CHSH inequality by more 
than the Tsirel'son bound, then it violates information causality. In particular, if Alice and Bob can 
share PR boxes, then using a protocol due to van Dam [?], they can violate information causality 
maximally, meaning that Bob's guess is correct with certainty, and the left hand side of Equation (|10[) 
is N. Pawlowski et al. also give a proof, using fairly standard manipulations of quantum mutual 
information, that quantum theory does satisfy information causality. 

One of the principle results of [T2] is a suficient condition for a general probabilistic theory to be 
information-causal. The following is a strengthening of that result: 
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Theorem 15. Suppose that a theory is strongly subadditive, and satisfies the Holevo bound. Then 
the theory satisfies information causality. It follows that any theory satisfying these conditions cannot 
violate Tsirel'son's bound. 

Since strong subadditivity and the Holevo bound follow from the data processing inequality, we 
have the following: 

Corollary 16. Any theory in which measurement-based mutual information satisfies the data pro- 
cessing inequality satisfies information causality. 

In [12j . monoentropicity was assumed in addition to SSA and Holevo. As noted there, it was 
only used to derive that H(A\B) > when A is classical. However, this follows easily from strong 
subadditivity in the equivalent (cf. Lemma [7]) form I (A : B\C) > 0, when we let A and B be 
identical perfectly correlated classical systems. We have 

I{A:B\C) = H(A\C)+H(B\C)-H(AB\C) (11) 
= H(AC)-H(C)+H(BC)-H(C)-H(ABC)+H(C) (12) 
= H(AC)+H(BC)-H(ABC)-H(C). (13) 

(14) 

Since A,B are perfectly correlated classical systems, H(AC) — H(BC) — H(ABC). Consequently, 
in this case I {A : B\C) = H{AC) - H{C) = H(A\C). By SSA, this is > 00 

4.5 Other developments 

There is much more to say about information processing in general probabilistic theories than we 
have room to discuss here. We remark in particular on [20 , in which a version of the deFinetti 
theorem is proved for states on test spaces. 



5 Characterizing Quantum Theory 

As we've seen, a great number of information-processing phenomena first discovered in associa- 
tion with quantum theory, are actually rather more generally post- classical, rather than specfically 
quantum-mechanical, in character. This brings us back to the question of how to characterize quan- 
tum theory in operational or probabilistic terms. The idea is to identify one or more features of 
quantum theory that can be expressed in purely operational-probabilistic terms — roughly, with- 
out any special reference to the Hilbert space structure, but only in terms of primitive concepts 
such as states, effects, tests, processes, etc. — and that, taken together, uniquely specify quantum 
(or quantum-plus-classical) models. This is an old problem, and also a somewhat vague one, since 
what counts as a satisfactory solution will be, to some extent, a matter of taste. Even so, striking 
progress has been made in the past several years, leading to several different, more-or-less satisfac- 
tory charaegterizations of quantum mechanics as a probability theory [refs]. have been found. In 
this section, we review one of these [TTl [70J [72l HB] , which makes use of the equivalence between 
homogeneous self-dual cones and Euclidean Jordan algebras. 

5.1 Homogeneity and Self-Duality 

Let E be (for the moment) any finite-dimensional ordered linear space. Given a bilinear form 
23 : E x E — s- R, we define the internal dual (with respect to 23) of the cone E + to be the cone 

E+ := {a e E\Mx £ E+, T>(a,x) > 0}. 

We say that 23 is positive on E + , or simply positive, iff E + C E + — in other words, if the linear 
mapping (3 : E — » E* given by /3(a)(x) = 23(a,x) is positive. 

15 The realization that Theorem 4 of 1121 could be strengthened this way grew out of discussions between some of 
the authors of 11 21 while the article was in press, but too late for inclusion in the published version. 
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Definition 23. E is self-dual with respect to 2? iff E = E + . We shall say that E is weakly self-dual 
iff there exists a bilinear form 23 with respect to which E is self-dual, and strongly self-dual, if there 
exists an inner product on E having this feature. 

Weak self-duality is equivalent to the existence of an isomorphism state in A<g) max A. As discussed 
above, this is equivalent to the requirement that there exist some composite of three copies of A 
that supports a teleportation protocol, and to the requirement that states on A arise as marginals of 
steering states in a composite of A with itself [JJ]. Strong self-duality is much less easy to motivate, 
but we will discuss several ways in which it can be justified in the next section. 

Recall that E is homogeneous with respect to a group S of order-automorphisms if S acts tran- 
sitively on the interior of the positive cone E + , so that for every pair of interior points a, b of E + , 
there exists an element g € 9 with ga = b. 

Classical and quantum probabilistic models are both homogeneous and self-dual. Somewhat 
more generally, let Ebea euclidean Jordan algebra. This is a finite-dimensional real vector space E 
equipped with a commutative bilinar operation • satisfying the Jordan identity a 2 •(&• a) = (a 2 »b)»a 
for all a,b G E, and equipped with a canonical trace such that (a, b) :— Tr(a»6) is an innner product, 
with (a • b, c) = (a, buc) for all a,b,c G E. The set E + = {a 2 \a G E} (where a 2 — a • a) is a cone in 
E+, and one can show is homogeneous with respect to the group of order-automorphisms of E, and 
self-dual with respect to the tracial inner product. Remarkably, there is a converse, to be found in 
work of M. Koecher [32] and E. Vinberg [55] 

If G be any closed subgroup of Aut(E), acting transitively on the interior of E + , then G is 
a Lie subgroup of GL(E). Let g denote its Lie algebra, and let q u denote the Lie algebra of the 
stabilizer G u < G of the order- unit. The following formulation of the Koecher- Vinberg Theorem 
summarizes the construction of the Jordan product on E. See [30] for a proof (also, the Appendix 
to |18] contains a fairly detailed outline of the proof and some additional remarks pertinent to the 
precise version given above) : 

Theorem 17 (Koecher- Vinberg). Let E + be self-dual with respect to some inner product on E, and 
let G be a closed, connected subgroup of Aut(E), acting transitively on the interior of E + . Then 

(a) It is possible to choose a self-dualizing inner product on E + in such a way that G u = GC\&(E) 
(where 0(E) is the orthogonal group with respect to the inner product); 

(b) If G = G' with respect to this inner product, then q u = {X G g\X^ = —X} = {X G g|V?i = 0}, 
and q = Q u © p, where p = {X G q\X^ = X}; 

(c) In this case the mapping p — > E, given by X i— > Xu, is an isomorphism. Letting L a be the 
unique element of p with L a u = a, define 

a • b = L a b 

for all a,b G E. Then • makes E a formally real Jordan algebra, with identity element u. 

In [IT], Jordan, von Neumann and Wigner classified Euclidean Jordan algebras as belonging to 
one of two broad types, plus one exceptional example. These are 

(a) Hermitian parts of matrix algebras over K, C or H, ordered as usual; 

(b) Spin factors, in which the normalized state space is a ball of dimension n; and 

(c) The Exceptional Jordan Algebra of positive 2x2 hermitian matrices over the Octonions. 

Thus, it would seem that if we can motivate both homogeneity and self-duality in operational terms, 
we will go a great way towards obtaining an operational characterization of finite-dimensional QM. 
This problem is taken up in the next section. We then discuss the consequences of assuming that a 
monoidal probabilistic theory consisting of Jordan models has locally tomographic composites. Here 
a theorem of H. Hanche-Olsen [35] can be invoked to show that, so long as the theory contains even 
a single instance of the simplest quantum-mechanical system — a qubit — every system allowed by 
the theory must be the theory must be quantum. 
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5.2 Motivating Homogeneity and Self-Duality 



Let us call a model A HSD (Homogeneous and self-dual) iff its linear hull E(A) — or, equivalently, 
its dual, V(A) — is homogeneous and self-dual. Why should this be the case? In this section, we 
discuss several possible answers. 

Homogeneity A model A is uniform iff the state space Q contains a uniform state fi, i.e., one 
taking constant values \/n on all outcomes of X(A). Of course, this implies that all tests in 'M.(A) 
have cardinality n. For uniform systems, homogeneity of E{A) has a straightforward, natural and 
physically reasonable interpretation: it asserts that every non-singular state should be preparable, 
by means of a probabilistically reversible transformation, from the uniformly (or maximally) mixed 
state. @- As noted above, homogeneity is also implied by either of the following conditions: 

(a) Every interior state is the marginal of an isomorphism state 

(b) Every state is the marginal of a steering state. 

Yet another way of arriving at the homogeneity of ~V(A) can be found in |70) . 

Self-Duality Self-duality seems less clear-cut, but can be obtained as a consequence of certain 
symmetry assumptions. Perhaps the simplest and most dramatic is the following beautiful result 
due to M. Mueller and C. Ududec. Call two states a, (3 £ £l(A) sharply distinguishable by effects 
iff there exists an effect a such that a(a) = 1 and (3(a) — 0. Mueller and Ududec call a system 
bit-symmetric iff every such pair of states can be mapped to any other such pair by a symmetry of 
the state cone, that is, an affine symmetry of f2. They then prove: 

Theorem 18 ([SU]). If £l(A) is bit- symmetric, then'V(A) (and hence, E(A)) is self-dual. 

It is worth noting that not every self-dual model is bit-symmetric. For instance, if Q is a 2- 
dimensional regular In + 1-gon, then V(f2) is self-dual, but Q is not bit-symmetric. Bit-symmetry 
is thus a very restrictive, yet very plausible, and operationally meaningful, constraint. 

A more involved condition having a somewhat similar flavor, but dealing with the test space 
structure X(A) rather than the pure states of A, is worth mentioning. Call A bi-symmetric iff it is 
2-symmetric under G(A) and if G(A) acts transitively on pure states. As disussed in Section 2.2, it 
is quite easy to construct such models one at a time. Recall that A is sharp iff for every outcome x, 
there is a unique state a with a(x) = 1. 

Theorem 19 ([72 ). Let C be a monoidal probabilistic theory in which every model is bi-symmetric. 
If A G 6 is irreducible and sharp, then E(A) is self-dual. 

Another way of obtaining self-duality from bi-symmetry involves the notion of a conjugate system: 

Definition 24. A conjugate for a model A is a structure (A, ja, Va), where A is a model, ja '■ A — > 
A is an isomorphism, and t\a is a bipartite state (on some non-signaling composite) AA such that 

i] A (x,ja(x)) = 1/n 

for every x € X (A) . We '11 call "/a the conjugation map and tja, the correlator for the given conjugate. 

Example 10. Let A = A(5C) be the quantum model associated with a complex Hilbert space !K, 
and A — A(tK) associated with the conjugate Hilbert space. Define a mapping ^a '■ X((K) — > 
X(5C) by ja ■ x n- x (strictly speaking, the identity map!). Then, as discussed in Section 3.3, 
VA{x,-fA(y)) = \(f,x® y)\ 2 = Tr{Pq,P x ® y ) is a correlator. 

If A has a conjugate, then it has a conjugate for which the correlator r\A is symmetric, in the 
sense that r\(x, 7a (y)) = viu^ 1a{x)), and invariant, in the sense that rjA{gx,^A{gy)) — v{x,~fA(y))- 

16 One might raise the aesthetic objection that it is awkward to make special reference to the interior state. But it 
is difficult to see how this is any worse aesthetically than making special reference to, say, pure states. 
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Indeed, n T (x, 7,4(2/)) := f?(?/>7A(x)) is again a correlator; averaging r\ and n T gives us a symmetric 
correlator. If 77 is symmetric, then for all symmetries g € G(^4), n 9 {x,y) — n{gx,gy) is again a 
symmetric correlator; averaging over G yields an invariant symmetric correlator. Henceforth, we 
assume that correlators are symmetric and invariant. It follows that the bilinear form 

S(o,6) :=n(a, lA (b)) 

is orthogonalizing, meaning that 23(x,?/) = for all x _L y in X(A). For the following, see |72j : 

Theorem 20. Le/ A be irreducible, bi- symmetric, and have a conjugate (A, 7,4, T}a). Then (a) 23 is 
an inner product on E, and (b) A is self-dual with respect to 23 iff t\a is an isomorphism state iff A 
is sharp. 

5.3 HSD and Jordan Models 

Call a model A HSD (Homogeneous and self-dual) iff the cone E + is homogeneous under some group 
S(^4) of order- automorphisms, and self-dual with respect to some inner product. If A is an HSD 
model, then by the Koecher-Vinberg theorem, E(A) carries a unique euclidean Jordan structure 
with respect to which the order unit, it, is the identity and (a, u) = Tr(a). 

An idempotent in a Jordan algebra E is an element e € E + with e 2 = e • e = e. Idempotents 
in the special Jordan algebra L^(JK) are precisely orthogonal projection operators. A primitive 
idempotent is an idempotent that is not a sum of other non-zero idempotents; thus, in the context 
of L/ t (5C), a primitive idempotent is a rank-one projection operator. Any Euclidean Jordan algebra 
E carries a canonical trace functional, with Tr(a6) = (a, b), and one can show that Tr(e) = 1 for any 
primitive idempotent. A Jordan frame in a Euclidean Jordan algebra E is a setei, e n of primitive 
idempotents summing to u. The Spectral Theorem for Euclidean Jordan algebras asserts that every 
a E E has a unique representation as a sum of the form Y^ e £E ^e e over a Jordan frame E, where 
{t e \e € E} are non- negative real coefficients. It follows that the extremal elements of the cone E + 
are exactly the primitive idempotents. The group of order-automorphisms of E fixing the unit u 
acts transitively on the set of Jordan frames, so all Jordan frames have the same size, the rank of 
E. (Indeed, regarding the set of Jordan frames as a test space, this group acts fully transtively, i.e., 
any permutation of a Jordan frame can be implemented by an order-automorphism of E.) 

Definition 25. A probabilistic model A is uniform iff its test have a uniform cardinality n, and the 
uniformly mixed probability weight /i(x) = 1/n belongs to Q(A). 

If A is an HSD model, then every primitive idempotent e in E{A) defines a pure state, (e|, and 
this is the unique pure state assigning probability 1 to the effect corresponding to e. By a Jordan 
model, we mean an HSD model A such that every outcome in X(A) is a primitive idempotent in 
E(A), or, equivalently, every test is a Jordan frame. Evidently, such a model is unital, indeed, sharp, 
and uniform. 

There is a converse. Suppose A is HSD. By an easy extension of the converse to the Krein- 
Mil'man theorem, any closed, generating subset of V(j4) + contains every a point on every extremal 
ray of ~V(A)+. By our standing assumpton of outcome-closure, the outcome-space X(A) is closed 
in E(A) + ; by construction, it is also generating. Since V(A) + ~ E{A) + , every extremal ray of 
E(A) + consists of multiples of an outcome. Giving E(A) its standard Jordan structure, primitive 
idempotents generate extremal rays of E(A) + , so every primitive idempotent in E(A) is a positive 
multiple of an outcome in X(A). 

Lemma 8. Let A be HSD, and let E{A) have its canonical Jordan structure. Then: 

(a) Every extremal unital outcome x G X (A) is a primitive idempotent. 

(b) If A is uniform, then every unital outcome is extremal, hence, a primitive idempotent. 

(c) If A is both unital and uniform, it is a Jordan model. 
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Proof: (a) Let x € X(A) be extremal. As discussed above, there then exists some t > such that 
tx =: e, a primitive idcmpotent. Now suppose / is a primitive idempotent representing a pure state 
of E, with (f,x) = 1. Then 

t = t{f,x)={f,tx) = {f,e)<l, 
by the Cauchy-Schwarz inequality. Now notice that 

i 2 (x, a;) = (e, e) = 1 

so (x,x) — 1/t 2 . Choosing any E E 3Vt(A) with x € E, we now have 

1 = ( e , it) = t(x, u) = t\(x,x)+ ^2 (x,y}\ >t{x,x) =t/t 2 = l/t, 

V y£E\{x] J 

so that t > 1. Thus, t = 1, and a; = e, a primitive idempotent. 

(b) Let x — SiXi where the a^ are extremal outcomes and Si > 0. Let /i be the uniform state 
on Then 

— = (J,(x) = V* Sifl(Xi) = V* Si — 

m ' — ' ^ — ' to 

so s, = 1. If a: is unital, therefore, there exists a primitive idempotent / with 

1 = (./» = 'J2s i (f,x i ). 

i 

Since the coefficients Sj are convex, we have (/, Xi) = 1 for every i with Sj ^ 0. But then, every 
Xi is a unital extremal outcome and so, by part (a), a primitive idempotent. It follows (again by 
the Cauchy-Schwarz inequality) that Sj 7^ implies a^ = /, whence, x = f is again a primitive 
idempotent. (c) now follows at once from (a) and (b). □ 



5.4 Composites of Jordan Models 

Suppose a probabilistic theory C consists entirely of Jordan models. Under what conditions can one 
equip C with an associative compositional structure so as to obtain a monoidal probabilistic theory? 
Subject to two further requirements, this this is possible only if 6 is in fact a standard quantum 
theory: 

Theorem 21 ([IB])- Let C be a symmetric monoidal category of Jordan probabilistic models such 
that (i) for every A, B € C, the composite AB is locally tomographic, and (ii) at least one system 
in 6 has the structure of a qubit. Then every model in 6 is the hermitian part of a complex matrix 
algebra. 

The proof of this result exploits the following theorem due to H. Hanche-Olsen. 

Theorem 22 (Hanche-Olsen). If E is a JC (check) algebra and M2 is the Jordan algebra of 2 x 2 
hermitian matrices over C, then E is the Hermitian part of a complex matrix algebra iff there exists 
a Jordan product on E ® M2 such that 

(a <g> 1) • (b ® 1) = ab ® 1 and (1 ® x) • (1 ® y) = 1 ® xy (15) 

for all a,b G E and all x, y G M2. 

Essentially, [18] shows that if AB is a non-signaling HSD composite of HSD models A and B, then 
local tomography forces the Jordan product on E(AB) to satisfy (15). A key step is the following 
observation. 

Lemma 9. Suppose A is a Jordan model. Let AA be a non-signaling composite of A with itself. If 
AA is Jordan, then the trace form on E(AA) factors. 
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Proof: By definition of a composite, if x, y € X(A), then x ® y is an outcome in X(A<4). Since x 
and y are unital in A, x ® t/ is unital in X(A<4). Indeed, the pure product state (x| <8) (y| assigns 
x ® y probability 1 (again, by definition of a composite). Hence, by Lemma part (b) of Lemma 8, 
x ® y is a primitive idempotent in E(AA). But then we also have (x <g> y\x <g> y) = 1, and this is the 
unique pure state with this property. Hence, (x\ <g> (y\ = (x<&y\, so that 

(x®y\a®b) = (x\a)(y\b) 

for all a,b € -E(A). Since -X(A) spans the same holds with arbitrary elements of -E(A) in 

place of x and y, i.e, the inner product factors. □ 

Local tomography is a strong constraint on a probabilistic theory. The fact that real and quater- 
nionic quantum mechanics are not locally tomographic should at least slightly temper our willingness 
to adopt it. A classification of non-locally tomographic non-signaling composites of Jordan models 
is the subject of on-going work. 

6 Conclusion 

The framework we have sketched here for a post-classical probability theory has several virtues. It 
is conceptually conservative, mathematically straightforward, and easily accommodates free mathe- 
matical constructions, as well as the introduction of further structure (for example, one can readily 
topologize the concept of a test space; see [551 Still, at present, what we have is indeed just 
the sketch of a framework. Its further development offers many interesting opportunities. We close 
by mentioning five areas for further work. 

Quantum Axiomatics. As long as we restrict our attention to finite-dimensional probabilistic mod- 
els, it seems that there are many different axiomatic packages — that is, many different clusters of 
plausible constraints — that locate orthodox QM, or its near environs, within the wild landscape of 
general post-classical probabilistic theories. In addition to the approach via homogeneity and self- 
duality, sketched in Section 4, there are various derivations of finite-dimensional QM in the spirit 
of Hardy's axioms [39], including work by Rau [55], Dakic and Brukner [26], Masanes and Mueller 
[48] and Chiribella, D'Ariano and Perinotti [23 . A different approach [35] exploits information 
geometry. There is also the completeness theorem of Selinger [BT] for dagger-compact categories. 
This is not even to mention the various axiomatic treatments of quantum theory given in the older 
quantum-logical literature. (This last has been criticized as being too "mathematical" , but much 
of it becomes significantly simpler when specialized to the finite-dimensional case.) It would be of 
great interest to know how all of these various axiomatizations (most of which share at least a few 
assumptions), are related to one another. The mathematical framework developed here seems ideal 
for this task. 

Infinite- Dimensional Models Of even greater interest would be to extend the results of these efforts 
to infinite-dimensional settings. Individually, infinite-dimensional probabilistic models have been 
well-studied [27] [29], and tools are available for dealing with composites in this setting, too [67]. 
However, the line of argument developed in Section 5, depending as it does on the Koecher-Vinberg 
Theorem, docs not generalize easily to the infinite-dimensional setting. Efforts in this direction are 
just getting underway [refs?], but there is a great deal more work to be done. 

Quantum Field Theory Algebraic quantum field theory associates an algebra of observables to each 
open subset of spacetime. An obvious project would be to consider a probabilistic theory in which 
each such region is associated with a probabilistic model, subject to the constraint that the model 
associated with a union of spacelike separated regions be a non-signaling composite of the models 
associated with the regions individually. 
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Applications; Post- Quantum Information Theory The notion of a probabilistic model is very broad. 
It would likely be a fruitful exercise to look for applications outside of quantum information and the 
foundations of quantum mechanics in which models that are neither classical nor quantum arise. In 
anticipation of this, it would be very reasonable to further develop the post-classical information 
theory sketched in [T2J [52] , especially by investigating in some detail such ideas as channel capacity 
in this setting. 

The Measurement Problem. Even though we take measurements and measurement-outcomes as 
primitives, nothing prevents us from asking whether these can be modeled dynamically within the 
formal framework presented here. Certain versions of the measurement problem can be formulated 
as theorems in this framework, leading one to wonder whether various strategies for resolving the 
quantum measurement problem — e.g., some version of "many worlds" interpretations, or the ap- 
paratus of decoherence — have analogues in the setting of a general probabilistic theory. If so, 
this would shed some light on how these interpretive moves work; if not, then the existence of such 
an analogue could be regarded as another constraint on a probabilistic theory, taking us closer to 
orthodox QM. A further discussion of these matters can be found in |71j . 
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